oktoberfest.pp.annotate_spectral_library

oktoberfest.pp.annotate_spectral_library(psms, fragmentation_method='HCD', p_window=1.2, mass_tol=None, unit_mass_tol=None, custom_mods=None, annotate_neutral_loss=False, multifrag=False, featured_ions=None)

Annotate all specified ion peaks of given PSMs (Default b and y ions).

This function annotates the b any ion peaks of given psms by matching the mzs of all peaks to the theoretical mzs and discards all other peaks. It also calculates the theoretical monoisotopic mass of each b and y ion fragment. The function thenr returns a Spectra object containing the mzs and intensities of all b and y ions in charge states 1-3 and the additional metadata.

Parameters:
  • psms (DataFrame) – Spectral library to be annotated.

  • mass_tol (Optional[float]) – The mass tolerance allowed for retaining peaks

  • unit_mass_tol (Optional[str]) – The unit in which the mass tolerance is given

  • fragmentation_method (str) – fragmentation method that was used

  • custom_mods (Optional[dict[str, float]]) – mapping of custom UNIMOD string identifiers (‘[UNIMOD:xyz]’) to their mass

  • annotate_neutral_loss (Optional[bool]) – flag to indicate whether to annotate neutral loss peaks or not

  • multifrag (Optional[bool]) – flag to indicate whether to annotate multifrag peaks or not

  • featured_ions (Optional[list[str]]) – list of ions to be annotated

  • p_window (Optional[float]) – window size for precursor peak removal

Return type:

Spectra

Returns:

Spectra object containing the annotated featured sion peaks including metadata

Example:

>>> from oktoberfest import preprocessing as pp
>>> import pandas as pd
>>> psms = pd.DataFrame({"RAW_FILE": ["File1","File2"],
>>>                     "SCAN_NUMBER": [5123,4012],
>>>                     "MODIFIED_SEQUENCE": ["AAAC[UNIMOD:4]RFVQ","RM[UNIMOD:35]PC[UNIMOD:4]HKPYL"],
>>>                     "PRECURSOR_CHARGE": [1,2],
>>>                     "PEPTIDE_LENGTH": [8,9],
>>>                     "MASS_ANALYZER": ["FTMS","FTMS"],
>>>                     "INTENSITIES": [np.random.rand(174),np.random.rand(174)],
>>>                     "MZ": [np.random.rand(174),np.random.rand(174)]})
>>> library = pp.annotate_spectral_library(psms=psms, mass_tol=15, unit_mass_tol="ppm")
>>> print(library)