Configuration

The following provides an overview of all available flags in the configuration file to use the high-level API and run jobs. Parameters may be applicable to more than one job type and are collected within indivdual tables.

Always applicable

Parameter	Description
type	The type of job; can be one of “CollisionEnergCalibration”, “SpectralLibraryGeneration” or “Rescoring”
tag	Optional mass tag Can be “tmt”, “tmtpro”, “itraq4” or “itraq8”; default is “”
models	Contains information about the used models for peptide property prediction (see following 2 nested parameters)
intensity	Name or path of the model used for fragment intensity prediction
irt	Name of the model used for indexed retention time prediction
inputs	Contains information about inputs and the type of the inputs (see following nested parameter)
instrument_type	The type of mass spectrometer used to measure the spectra. Superseeds the value read from the mzML file (default). When predicting intensities with AlphaPept, choose one of [“QE”, “LUMOS”, “TIMSTOF”, “SCIEXTOF”], if the instrument type of your data is not supported.
static_mods	Custom static modifications in the format “<key>”: [<UNIMOD_ID>, <mod_mass>], e.g. “C”: [4, 57.0215] where <key> is search engine specific. Overwrites default modifications used. See Custom modification for detailed information.
var_mods	Custom variable modifications in the format “<key>”: [<UNIMOD_ID>, <mod_mass>], e.g. “M(ox)”: [35, 15.9949] where <key> is search engine specific. Overwrites default modifications used. See Custom modification for detailed information.
numThreads	Number of raw/mzml files processed in parallel (parallelisation on file level); more processes than files has no effect and should be avoided; for spectral library generation, the number of parallel prediction processes, needs to be balanced with batchsize in this case; default = 1
prediction_server	Server and port for obtaining peptide property predictions; default: “koina.wilhelmlab.org:443”
ssl	Use ssl when making requests to the prediction server, can be true or false; default = true
output	Path to the output folder (relative to the location of the config file); default = “./”

Applicable to CE calibration and rescoring

Parameter	Description
inputs	Contains information about inputs and the type of the inputs (see following 4 nested parameters)
search_results	Path to directory or file containing the search results
search_results_type	Format description for search results; can be “Maxquant”, “Msfragger”, “Mascot”, “Sage”, “OpenMS”, or “Internal”; default = “Maxquant”
spectra	Path to directory or file containing mass spectrometry results
spectra_type	Format description for files containing spectra; can be “raw”, “mzml”, “d” or “hdf”; default = “raw”; in case of mixed mzML/RAW input, select “raw”; in case of mixed d/hdf input select “d”
thermoExe	Path to ThermoRawFileParser executable; needed if spectra are provided in RAW format; default “ThermoRawFileParser.exe”
massTolerance	Defines the allowed tolerance between theoretical and experimentally observered fragment mass during peak annotation; default = 20 (FTMS), 40 (TOF), 0.35 (ITMS)
unitMassTolerance	Defines the measure of tolerance, either “da” or “ppm”; default = da (mass analyzer is ITMS), ppm (mass analyzer is FTMS or TOF)
ce_alignment_options	Contains settings for collision energy alignment
ce_range	Min and max collision energy (end-exclusive) used for calibration; i.e. (5,10) tests every CE from 5 to 9. Default is (19,50)
use_ransac_model	Boolean that determines whether or not to use a ransac regression model for calibration refinement. This is recommended for timsTOF data. Default is false.

Applicable to rescoring

Parameter	Description
fdr_estimation_method	Method used for target / decoy separation and FDR estimation on PSM and peptide level: “percolator” or “mokapot”; default = “mokapot”
regressionMethod	Regression method for curve fitting (mapping from predicted iRT values to experimental retention times); can be “spline”, “lowess”, or “logistic”; default = “spline”
add_feature_cols	Additional columns to be used as percolator/mokapot input features; Can be “all” for all additional columns in provided internal search results or a list of column names; default = “none”
quantification	(Optional) If True, run picked-group-FDR for quantification. This also requires in-silico digestion options (see “Applicable to in-silico digestion”) and a fasta input.
inputs	Contains information about the fasta file (only needed if quantification is True).
library_input	Path to fasta file for in-silico digestion (also see the required parameters for in-silico digestion above)

Applicable to spectral library generation

Parameter	Description
inputs	Contains information about additional inputs required for spectral library generation and the type of the inputs (see following 2 nested parameters)
library_input	Path to fasta file for in-silico digestion (also see the required parameters in the following section) or an existing output file from a digestion
library_input_type	Library input type description; can be “fasta” to perform in-silico digestion (see options below), “peptides / internal” (for automatic generation of internal format using below spectralLibraryOptions or ready to use internal format.
spectralLibraryOptions	Contains information about additional settings required for spectral library generation and what to save to disk (see following 7 nested parameters)
fragmentation	Method used for fragmentation; can be “HCD” or “CID”; default = “”
collisionEnergy	The collision energy for which the library should be created; default = 30
precursorCharge	The precursor charges for which the library should be created, can be a list or single number; default = [2,3]
minIntensity	The minimal relative intensity threshold for peaks, everything below is not saved, this can help reducing the library size; default = 5e-4
nrOx	The maximum number of oxidations allowed on Methionine residues (M) in peptides during spectral library generation; default = 1
batchsize	Number of peptides for which predictions are retrieved at once before writing, larger batches result in higher memory peaks, needs to be balanced with n_threads, default = 10000
format	Output format of the generated spectral library; can be “spectronaut”,”msp”, or “dlib”; default = “msp”

Applicable to in-silico digestion

Parameter	Description
fastaDigestOptions	Contains specific settings for the in-silico digestion of a provided fasta file (see following 8 nested parameters)
digestion	Digestion mode; can be “full”, “semi” or None; default = “full”
missedCleavages	Number of allowed missed cleavages used in the search engine for generation of the provided search results; default = 2
minLength	Minimum peptide length allowed used in the search engine for generation of the provided search results; default = 7
maxLength	Minimum peptide length allowed used in the search engine for generation of the provided search results; default = 60
enzyme	Name of the enzyme used in the search engine; default = “trypsin”
specialAas	Special amino acids for decoy generation; default = “KR”
db	Defines whether the digestion should contain only targets, only decoys or both (concatenated); can be “target”, “decoy” or “concat”; default = “concat”

Applicable to PTM pipeline

Parameter	Description
ptm_localization	Flag to indicate whether the user want to run the PTM localization pipeline or not.
ptmLocalizationOptions	Contains specific settings for the ptm pipele (see following 2 nested parameters)
unimod_id	unimod_id from unimod.org to indicate the target ptm. e.g (7 for citrullination/deamidation)
possible_sites	List of Possible sites where the PTM can happen. e.g ([‘R’,’N’,’Q’] for citrullination/deamidation)
neutral_loss	Flag to annotate neutral loss peaks and use it as a feaure in percolator.

Applicable to local intensity prediction

Parameter	Description
dlomixInferenceBatchSize	Batch size to use for local inference with DLomix

Applicable to transfer/refinement learning

Parameter	Description
refinementLearningOptions	Contains specific settings for local refinement learning of intensity predictor on provided spectra. If not present, no refinement learning will be performed.
batchSize	Defines batch size to use for training; default = 1024
includeOriginalSequences	Defines whether unmodified peptide sequences should be kept in processed DLomix dataset for downstream analysis; default = False
improveFurther	Defines whether to perform an additional third training phase during refinement learning to further improve the predictor; default = False.
wandbOptions	Contains specific settings for using WandB when doing refinement learning. If not present, WandB will not be used.
project	Project to save WandB run to; default = “DLomix_auto_RL_TL”
targets	Tags to use for WandB run; default = None
datasetFilteringOptions	Contains specific settings for filtering the refinement/transfer learning dataset. If not provided, will only remove decoys.
searchEngineScoreThreshold	Threshold for included peptides, everything below will be discarded.
numDuplicates	Number of (peptide, charge, collision energy) duplicates to include.