Configuration

The following provides an overview of all available flags in the configuration file to use the high-level API and run jobs. Parameters may be applicable to more than one job type and are collected within indivdual tables.

Always applicable

Parameter

Description

type

The type of job; can be one of “CollisionEnergCalibration”, “SpectralLibraryGeneration” or “Rescoring”

tag

Optional mass tag Can be “tmt”, “tmtpro”, “itraq4” or “itraq8”; default is “”

models

Contains information about the used models for peptide property prediction (see following 2 nested parameters)

intensity

Name or path of the model used for fragment intensity prediction

irt

Name of the model used for indexed retention time prediction

inputs

Contains information about inputs and the type of the inputs (see following nested parameter)

instrument_type

The type of mass spectrometer used to measure the spectra. Superseeds the value read from the mzML file (default). When predicting intensities with AlphaPept, choose one of [“QE”, “LUMOS”, “TIMSTOF”, “SCIEXTOF”], if the instrument type of your data is not supported.

static_mods

Custom static modifications in the format “<key>”: [<UNIMOD_ID>, <mod_mass>], e.g. “C”: [4, 57.0215] where <key> is search engine specific. Overwrites default modifications used. See Custom modification for detailed information.

var_mods

Custom variable modifications in the format “<key>”: [<UNIMOD_ID>, <mod_mass>], e.g. “M(ox)”: [35, 15.9949] where <key> is search engine specific. Overwrites default modifications used. See Custom modification for detailed information.

numThreads

Number of raw/mzml files processed in parallel (parallelisation on file level); more processes than files has no effect and should be avoided; for spectral library generation, the number of parallel prediction processes, needs to be balanced with batchsize in this case; default = 1

prediction_server

Server and port for obtaining peptide property predictions; default: “koina.wilhelmlab.org:443”

ssl

Use ssl when making requests to the prediction server, can be true or false; default = true

output

Path to the output folder (relative to the location of the config file); default = “./”

Applicable to CE calibration and rescoring

Parameter

Description

inputs

Contains information about inputs and the type of the inputs (see following 4 nested parameters)

search_results

Path to directory or file containing the search results

search_results_type

Format description for search results; can be “Maxquant”, “Msfragger”, “Mascot”, “Sage”, “OpenMS”, or “Internal”; default = “Maxquant”

spectra

Path to directory or file containing mass spectrometry results

spectra_type

Format description for files containing spectra; can be “raw”, “mzml”, “d” or “hdf”; default = “raw”; in case of mixed mzML/RAW input, select “raw”; in case of mixed d/hdf input select “d”

thermoExe

Path to ThermoRawFileParser executable; needed if spectra are provided in RAW format; default “ThermoRawFileParser.exe”

massTolerance

Defines the allowed tolerance between theoretical and experimentally observered fragment mass during peak annotation; default = 20 (FTMS), 40 (TOF), 0.35 (ITMS)

unitMassTolerance

Defines the measure of tolerance, either “da” or “ppm”; default = da (mass analyzer is ITMS), ppm (mass analyzer is FTMS or TOF)

ce_alignment_options

Contains settings for collision energy alignment

ce_range

Min and max collision energy (end-exclusive) used for calibration; i.e. (5,10) tests every CE from 5 to 9. Default is (19,50)

use_ransac_model

Boolean that determines whether or not to use a ransac regression model for calibration refinement. This is recommended for timsTOF data. Default is false.

Applicable to rescoring

Parameter

Description

fdr_estimation_method

Method used for target / decoy separation and FDR estimation on PSM and peptide level: “percolator” or “mokapot”; default = “mokapot”

regressionMethod

Regression method for curve fitting (mapping from predicted iRT values to experimental retention times); can be “spline”, “lowess”, or “logistic”; default = “spline”

add_feature_cols

Additional columns to be used as percolator/mokapot input features; Can be “all” for all additional columns in provided internal search results or a list of column names; default = “none”

quantification

(Optional) If True, run picked-group-FDR for quantification. This also requires in-silico digestion options (see “Applicable to in-silico digestion”) and a fasta input.

inputs

Contains information about the fasta file (only needed if quantification is True).

library_input

Path to fasta file for in-silico digestion (also see the required parameters for in-silico digestion above)

Applicable to spectral library generation

Parameter

Description

inputs

Contains information about additional inputs required for spectral library generation and the type of the inputs (see following 2 nested parameters)

library_input

Path to fasta file for in-silico digestion (also see the required parameters in the following section) or an existing output file from a digestion

library_input_type

Library input type description; can be “fasta” to perform in-silico digestion (see options below), “peptides / internal” (for automatic generation of internal format using below spectralLibraryOptions or ready to use internal format.

spectralLibraryOptions

Contains information about additional settings required for spectral library generation and what to save to disk (see following 7 nested parameters)

fragmentation

Method used for fragmentation; can be “HCD” or “CID”; default = “”

collisionEnergy

The collision energy for which the library should be created; default = 30

precursorCharge

The precursor charges for which the library should be created, can be a list or single number; default = [2,3]

minIntensity

The minimal relative intensity threshold for peaks, everything below is not saved, this can help reducing the library size; default = 5e-4

nrOx

The maximum number of oxidations allowed on Methionine residues (M) in peptides during spectral library generation; default = 1

batchsize

Number of peptides for which predictions are retrieved at once before writing, larger batches result in higher memory peaks, needs to be balanced with n_threads, default = 10000

format

Output format of the generated spectral library; can be “spectronaut”,”msp”, or “dlib”; default = “msp”

Applicable to in-silico digestion

Parameter

Description

fastaDigestOptions

Contains specific settings for the in-silico digestion of a provided fasta file (see following 8 nested parameters)

digestion

Digestion mode; can be “full”, “semi” or None; default = “full”

missedCleavages

Number of allowed missed cleavages used in the search engine for generation of the provided search results; default = 2

minLength

Minimum peptide length allowed used in the search engine for generation of the provided search results; default = 7

maxLength

Minimum peptide length allowed used in the search engine for generation of the provided search results; default = 60

enzyme

Name of the enzyme used in the search engine; default = “trypsin”

specialAas

Special amino acids for decoy generation; default = “KR”

db

Defines whether the digestion should contain only targets, only decoys or both (concatenated); can be “target”, “decoy” or “concat”; default = “concat”

Applicable to PTM pipeline

Parameter

Description

ptm_localization

Flag to indicate whether the user want to run the PTM localization pipeline or not.

ptmLocalizationOptions

Contains specific settings for the ptm pipele (see following 2 nested parameters)

unimod_id

unimod_id from unimod.org to indicate the target ptm. e.g (7 for citrullination/deamidation)

possible_sites

List of Possible sites where the PTM can happen. e.g ([‘R’,’N’,’Q’] for citrullination/deamidation)

neutral_loss

Flag to annotate neutral loss peaks and use it as a feaure in percolator.

Applicable to local intensity prediction

Parameter

Description

dlomixInferenceBatchSize

Batch size to use for local inference with DLomix

Applicable to transfer/refinement learning

Parameter

Description

refinementLearningOptions

Contains specific settings for local refinement learning of intensity predictor on provided spectra. If not present, no refinement learning will be performed.

batchSize

Defines batch size to use for training; default = 1024

includeOriginalSequences

Defines whether unmodified peptide sequences should be kept in processed DLomix dataset for downstream analysis; default = False

improveFurther

Defines whether to perform an additional third training phase during refinement learning to further improve the predictor; default = False.

wandbOptions

Contains specific settings for using WandB when doing refinement learning. If not present, WandB will not be used.

project

Project to save WandB run to; default = “DLomix_auto_RL_TL”

targets

Tags to use for WandB run; default = None

datasetFilteringOptions

Contains specific settings for filtering the refinement/transfer learning dataset. If not provided, will only remove decoys.

searchEngineScoreThreshold

Threshold for included peptides, everything below will be discarded.

numDuplicates

Number of (peptide, charge, collision energy) duplicates to include.