oktoberfest.pp.annotate_spectral_library
- oktoberfest.pp.annotate_spectral_library(psms, fragmentation_method='HCD', p_window=1.2, mass_tol=None, unit_mass_tol=None, custom_mods=None, annotate_neutral_loss=False, multifrag=False, featured_ions=None)
Annotate all specified ion peaks of given PSMs (Default b and y ions).
This function annotates the b any ion peaks of given psms by matching the mzs of all peaks to the theoretical mzs and discards all other peaks. It also calculates the theoretical monoisotopic mass of each b and y ion fragment. The function thenr returns a Spectra object containing the mzs and intensities of all b and y ions in charge states 1-3 and the additional metadata.
- Parameters:
psms (
DataFrame) – Spectral library to be annotated.mass_tol (
Optional[float]) – The mass tolerance allowed for retaining peaksunit_mass_tol (
Optional[str]) – The unit in which the mass tolerance is givenfragmentation_method (
str) – fragmentation method that was usedcustom_mods (
Optional[dict[str,float]]) – mapping of custom UNIMOD string identifiers (‘[UNIMOD:xyz]’) to their massannotate_neutral_loss (
Optional[bool]) – flag to indicate whether to annotate neutral loss peaks or notmultifrag (
Optional[bool]) – flag to indicate whether to annotate multifrag peaks or notfeatured_ions (
Optional[list[str]]) – list of ions to be annotatedp_window (
Optional[float]) – window size for precursor peak removal
- Return type:
- Returns:
Spectra object containing the annotated featured sion peaks including metadata
- Example:
>>> from oktoberfest import preprocessing as pp >>> import pandas as pd >>> psms = pd.DataFrame({"RAW_FILE": ["File1","File2"], >>> "SCAN_NUMBER": [5123,4012], >>> "MODIFIED_SEQUENCE": ["AAAC[UNIMOD:4]RFVQ","RM[UNIMOD:35]PC[UNIMOD:4]HKPYL"], >>> "PRECURSOR_CHARGE": [1,2], >>> "PEPTIDE_LENGTH": [8,9], >>> "MASS_ANALYZER": ["FTMS","FTMS"], >>> "INTENSITIES": [np.random.rand(174),np.random.rand(174)], >>> "MZ": [np.random.rand(174),np.random.rand(174)]}) >>> library = pp.annotate_spectral_library(psms=psms, mass_tol=15, unit_mass_tol="ppm") >>> print(library)