Custom in-silico digestion

While Oktoberfest can do in-silico digestion by providing a fasta file, you can also provide a list of peptides yourself, or follow the below internal format for the highest level of customization.

Providing a list of peptides and associated proteins

In this case, you need to have the following parameter in your config file:

"library_input_type": "peptides",

Oktoberfest will then create the table of peptides with associated metadata in internal format (see below) based on the configuration in the spectralLibraryOptions of your configuration file. For a list of these options, check the configuration options.

Description of peptide list columns

Oktoberfest expects a csv formatted file, where each row represent a peptide and optional mappings to proteins.

Column Header

Explanation

peptide

The unmodified peptide sequence. “C” will always be carbamidomethylated (fixed modification), and a TMT modification is always added to the N-term and “K” if a tag is specified in the configuration file.

proteins

An optional list of protein ids separated by ‘;’. If this column is left out, or if no protein is provided, the string “unknown” will be used as a proteinID in the spectral library.

Example of peptide list

peptide,proteins
ASPTQPIQL,
KIEKLKVEL,
AAAAAWEEPSSGNGTAR,Q9P258
KDVDGAYMTK,P04264;CON__P04264
VIGRGSYAK,P11216;P11217
TTENIPGGAEEISEVLDSLENLMR,tr|A0A075B6G3|A0A075B6G3_HUMAN;sp|P11532|DMD_HUMAN;tr|A0A5H1ZRP8|A0A5H1ZRP8_HUMAN
TYCDATKCFTVTE

Internal file format specification

If you want to have full control, you can provide the table in internal format directly. In this case, you need to have the following parameter in your config file:

"library_input_type": "internal",

Oktoberfest will then read the table directly.

Description of internal file columns

Oktoberfest expects a csv formatted file where each row represents a peptide with given metadata. The following table provides the file format specification.

Column Header

Explanation

modified_sequence

The peptide sequence including variable modifications in unimod format (only M[UNIMOD:35] is supported). “C” will always be carbamidomethylated (fixed modification), and a TMT modification is always added to the N-term and “K” if a tag is specified in the configuration file.

collision_energy

The collision energy to use in peptide property prediction

precursor_charge

Charge state of the precursor ion

fragmentation

Method used for fragmentation; can be “HCD” or “CID”

peptide_length

An optional column containing the sequence list. Needed only when predicting intensities with AlphaPept.

instrument_types

An optional column containing the type of mass spectrometer. Only needed when predicting intensities with AlphaPept. Choose one of [“QE”, “LUMOS”, “TIMSTOF”, “SCIEXTOF”].

proteins

An optional list of protein ids separated by ‘;’

Example of internal file

modified_sequence,collision_energy,precursor_charge,fragmentation,peptide_length,instrument_types,proteins
ASPTQPIQL,31,1,HCD,,,
KIEKLKVEL,31,2,HCD,9,QE,
AAAAAWEEPSSGNGTAR,30,3,HCD,,,Q9P258
AAAAAWEEPSSGNGTAR,31,2,HCD,,,Q9P258
KDVDGAYM[UNIMOD:35]TK,30,2,HCD,10,LUMOS,P04264;CON__P04264
VIGRGSYAK,35,2,HCD,9,TIMSTOF,P11216;P11217
TTENIPGGAEEISEVLDSLENLMR,30,1,hcd,tr|A0A075B6G3|A0A075B6G3_HUMAN;sp|P11532|DMD_HUMAN;tr|A0A5H1ZRP8|A0A5H1ZRP8_HUMAN
TYCDATKCFTVTE,34,2,HCD,,,