Configuration
The following provides an overview of all available flags in the configuration file to use the high-level API and run jobs. Parameters may be applicable to more than one job type and are collected within indivdual tables.
Always applicable
Parameter |
Description |
|---|---|
type |
The type of job; can be one of “CollisionEnergCalibration”, “SpectralLibraryGeneration” or “Rescoring” |
tag |
Optional mass tag Can be “tmt”, “tmtpro”, “itraq4” or “itraq8”; default is “” |
models |
Contains information about the used models for peptide property prediction (see following 2 nested parameters) |
intensity |
Name or path of the model used for fragment intensity prediction |
irt |
Name of the model used for indexed retention time prediction |
inputs |
Contains information about inputs and the type of the inputs (see following nested parameter) |
instrument_type |
The type of mass spectrometer used to measure the spectra. Superseeds the value read from the mzML file (default). When predicting intensities with AlphaPept, choose one of [“QE”, “LUMOS”, “TIMSTOF”, “SCIEXTOF”], if the instrument type of your data is not supported. |
static_mods |
Custom static modifications in the format “<key>”: [<UNIMOD_ID>, <mod_mass>], e.g. “C”: [4, 57.0215] where <key> is search engine specific. Overwrites default modifications used. See Custom modification for detailed information. |
var_mods |
Custom variable modifications in the format “<key>”: [<UNIMOD_ID>, <mod_mass>], e.g. “M(ox)”: [35, 15.9949] where <key> is search engine specific. Overwrites default modifications used. See Custom modification for detailed information. |
numThreads |
Number of raw/mzml files processed in parallel (parallelisation on file level); more processes than files has no effect and should be avoided; for spectral library generation, the number of parallel prediction processes, needs to be balanced with batchsize in this case; default = 1 |
prediction_server |
Server and port for obtaining peptide property predictions; default: “koina.wilhelmlab.org:443” |
ssl |
Use ssl when making requests to the prediction server, can be true or false; default = true |
output |
Path to the output folder (relative to the location of the config file); default = “./” |
Applicable to CE calibration and rescoring
Parameter |
Description |
|---|---|
inputs |
Contains information about inputs and the type of the inputs (see following 4 nested parameters) |
search_results |
Path to directory or file containing the search results |
search_results_type |
Format description for search results; can be “Maxquant”, “Msfragger”, “Mascot”, “Sage”, “OpenMS”, or “Internal”; default = “Maxquant” |
spectra |
Path to directory or file containing mass spectrometry results |
spectra_type |
Format description for files containing spectra; can be “raw”, “mzml”, “d” or “hdf”; default = “raw”; in case of mixed mzML/RAW input, select “raw”; in case of mixed d/hdf input select “d” |
thermoExe |
Path to ThermoRawFileParser executable; needed if spectra are provided in RAW format; default “ThermoRawFileParser.exe” |
massTolerance |
Defines the allowed tolerance between theoretical and experimentally observered fragment mass during peak annotation; default = 20 (FTMS), 40 (TOF), 0.35 (ITMS) |
unitMassTolerance |
Defines the measure of tolerance, either “da” or “ppm”; default = da (mass analyzer is ITMS), ppm (mass analyzer is FTMS or TOF) |
ce_alignment_options |
Contains settings for collision energy alignment |
ce_range |
Min and max collision energy (end-exclusive) used for calibration; i.e. (5,10) tests every CE from 5 to 9. Default is (19,50) |
use_ransac_model |
Boolean that determines whether or not to use a ransac regression model for calibration refinement. This is recommended for timsTOF data. Default is false. |
Applicable to rescoring
Parameter |
Description |
|---|---|
fdr_estimation_method |
Method used for target / decoy separation and FDR estimation on PSM and peptide level: “percolator” or “mokapot”; default = “mokapot” |
regressionMethod |
Regression method for curve fitting (mapping from predicted iRT values to experimental retention times); can be “spline”, “lowess”, or “logistic”; default = “spline” |
add_feature_cols |
Additional columns to be used as percolator/mokapot input features; Can be “all” for all additional columns in provided internal search results or a list of column names; default = “none” |
quantification |
(Optional) If True, run picked-group-FDR for quantification. This also requires in-silico digestion options (see “Applicable to in-silico digestion”) and a fasta input. |
inputs |
Contains information about the fasta file (only needed if quantification is True). |
library_input |
Path to fasta file for in-silico digestion (also see the required parameters for in-silico digestion above) |
Applicable to spectral library generation
Parameter |
Description |
|---|---|
inputs |
Contains information about additional inputs required for spectral library generation and the type of the inputs (see following 2 nested parameters) |
library_input |
Path to fasta file for in-silico digestion (also see the required parameters in the following section) or an existing output file from a digestion |
library_input_type |
Library input type description; can be “fasta” to perform in-silico digestion (see options below), “peptides / internal” (for automatic generation of internal format using below spectralLibraryOptions or ready to use internal format. |
spectralLibraryOptions |
Contains information about additional settings required for spectral library generation and what to save to disk (see following 7 nested parameters) |
fragmentation |
Method used for fragmentation; can be “HCD” or “CID”; default = “” |
collisionEnergy |
The collision energy for which the library should be created; default = 30 |
precursorCharge |
The precursor charges for which the library should be created, can be a list or single number; default = [2,3] |
minIntensity |
The minimal relative intensity threshold for peaks, everything below is not saved, this can help reducing the library size; default = 5e-4 |
nrOx |
The maximum number of oxidations allowed on Methionine residues (M) in peptides during spectral library generation; default = 1 |
batchsize |
Number of peptides for which predictions are retrieved at once before writing, larger batches result in higher memory peaks, needs to be balanced with n_threads, default = 10000 |
format |
Output format of the generated spectral library; can be “spectronaut”,”msp”, or “dlib”; default = “msp” |
Applicable to in-silico digestion
Parameter |
Description |
|---|---|
fastaDigestOptions |
Contains specific settings for the in-silico digestion of a provided fasta file (see following 8 nested parameters) |
digestion |
Digestion mode; can be “full”, “semi” or None; default = “full” |
missedCleavages |
Number of allowed missed cleavages used in the search engine for generation of the provided search results; default = 2 |
minLength |
Minimum peptide length allowed used in the search engine for generation of the provided search results; default = 7 |
maxLength |
Minimum peptide length allowed used in the search engine for generation of the provided search results; default = 60 |
enzyme |
Name of the enzyme used in the search engine; default = “trypsin” |
specialAas |
Special amino acids for decoy generation; default = “KR” |
db |
Defines whether the digestion should contain only targets, only decoys or both (concatenated); can be “target”, “decoy” or “concat”; default = “concat” |
Applicable to PTM pipeline
Parameter |
Description |
|---|---|
ptm_localization |
Flag to indicate whether the user want to run the PTM localization pipeline or not. |
ptmLocalizationOptions |
Contains specific settings for the ptm pipele (see following 2 nested parameters) |
unimod_id |
unimod_id from unimod.org to indicate the target ptm. e.g (7 for citrullination/deamidation) |
possible_sites |
List of Possible sites where the PTM can happen. e.g ([‘R’,’N’,’Q’] for citrullination/deamidation) |
neutral_loss |
Flag to annotate neutral loss peaks and use it as a feaure in percolator. |
Applicable to local intensity prediction
Parameter |
Description |
|---|---|
dlomixInferenceBatchSize |
Batch size to use for local inference with DLomix |
Applicable to transfer/refinement learning
Parameter |
Description |
|---|---|
refinementLearningOptions |
Contains specific settings for local refinement learning of intensity predictor on provided spectra. If not present, no refinement learning will be performed. |
batchSize |
Defines batch size to use for training; default = 1024 |
includeOriginalSequences |
Defines whether unmodified peptide sequences should be kept in processed DLomix dataset for downstream analysis; default = False |
improveFurther |
Defines whether to perform an additional third training phase during refinement learning to further improve the predictor; default = False. |
wandbOptions |
Contains specific settings for using WandB when doing refinement learning. If not present, WandB will not be used. |
project |
Project to save WandB run to; default = “DLomix_auto_RL_TL” |
targets |
Tags to use for WandB run; default = None |
datasetFilteringOptions |
Contains specific settings for filtering the refinement/transfer learning dataset. If not provided, will only remove decoys. |
searchEngineScoreThreshold |
Threshold for included peptides, everything below will be discarded. |
numDuplicates |
Number of (peptide, charge, collision energy) duplicates to include. |