Logo

Contents:

  • Installation
  • Usage Principles
  • API
    • Data: data
    • Preprocessing: pp
      • Generating libraries
      • Spectra preprocessing
      • Peptide preprocessing
        • oktoberfest.pp.convert_search
        • oktoberfest.pp.load_search
        • oktoberfest.pp.split_search
        • oktoberfest.pp.convert_timstof_metadata
        • oktoberfest.pp.split_timstof_metadata
        • oktoberfest.pp.filter_peptides
        • oktoberfest.pp.filter_peptides_for_model
    • Predicting: pr
    • Rescoring: re
    • Plotting: pl
  • Contributor Guide
  • How to cite
Oktoberfest
  • API
  • oktoberfest.pp.convert_search
  • Edit on GitHub

oktoberfest.pp.convert_search

oktoberfest.pp.convert_search(input_path, search_engine, tmt_label='', custom_mods=None, output_file=None, ptm_unimod_id=0, ptm_sites=None)

Convert search results to Oktoberfest format.

Given a path to a file or directory containing search results from supported search engines, the function parses, converts them to the internal format used by Oktoberfest and returns it as a dataframe. If a path to an output file is provided, the converted results are also stored to the specified location. The specification of the internal file format can be found at Custom search results.

Parameters:
  • input_path (Union[str, Path]) – Path to the directory or file containing the search results.

  • search_engine (str) – The search engine used to produce the search results, currently supported are “Maxquant”, “Mascot” and “MSFragger”

  • tmt_label (str) – Optional tmt-label to consider when processing peptides. If given, the corresponding fixed modification for the N-terminus and lysin will be added

  • custom_mods (Optional[dict[str, int]]) – Optional dictionary parameter given when input_file is not in internal Oktoberfest format with static and variable mods as keys. The values are the integer values of the respective unimod identifier

  • output_file (Union[str, Path, None]) – Optional path to the location where the converted search results should be written to. If this is omitted, the results are not stored.

  • ptm_unimod_id (Optional[int]) – unimod id used for site localization

  • ptm_sites (Optional[list]) – possible sites that the ptm can exist on

Raises:

ValueError – if an unsupported search engine was given

Return type:

DataFrame

Returns:

A dataframe containing the converted results.

Example:

>>> from oktoberfest import preprocessing as pp
>>> import pandas as pd
>>> msms = pd.DataFrame({'Raw file': ['GN20170722_SK_HLA_G0103_R1_01', 'GN20170722_SK_HLA_G0103_R2_02'],
>>> 'Scan number': [21329, 20501],
>>> 'Scan index': [18847, 17998],
>>> 'Sequence': ['AAAAVVSGPKRGRKKP', 'AAAAVVSGPKRGRKKP'],
>>> 'Length': [16, 16],
>>> 'Missed cleavages': ['', ''],
>>> 'Modifications': ['Unmodified', 'Unmodified'],
>>> 'Modified sequence': ['_AAAAVVSGPKRGRKKP_', '_AAAAVVSGPKRGRKKP_'],
>>> 'Oxidation (M) Probabilities': ['', ''],
>>> 'Oxidation (M) Score Diffs': ['', ''],
>>> 'Oxidation (M)': [0, 0],
>>> 'Proteins': ['', ''],
>>> 'Charge': [3, 3],
>>> 'Fragmentation': ['HCD', 'HCD'],
>>> 'Mass analyzer': ['FTMS', 'FTMS'],
>>> 'Type': ['MULTI-SECPEP', 'MULTI-SECPEP'],
>>> 'Scan event number': [9, 5],
>>> 'Isotope index': [2, 2],
>>> 'm/z': [531.66176, 531.66176],
>>> 'Mass': [1591.9634, 1591.9634],
>>> 'Mass Error [ppm]': [-2.1109999999999998, -1.1018],
>>> 'Simple Mass Error [ppm]': [1259.2803, 1259.2803],
>>> 'Retention time': [46.272, 46.388000000000005],
>>> 'PEP': [0.57389, 0.57389],
>>> 'Score': [7.9138, 4.7582],
>>> 'Delta score': [3.5652, 1.4401],
>>> 'Score diff': ['', ''],
>>> 'Localization prob': [1, 1],
>>> 'Combinatorics': [0, 0],
>>> 'PIF': [0, 0],
>>> 'Fraction of total spectrum': [0, 0],
>>> 'Base peak fraction': [0, 0],
>>> 'Precursor Full ScanNumber': [-1, -1],
>>> 'Precursor Intensity': [0, 0],
>>> 'Precursor Apex Fraction': [0, 0],
>>> 'Precursor Apex Offset': [0, 0],
>>> 'Precursor Apex Offset Time': [0, 0],
>>> 'Matches Intensities': ['y5;y10;y5-NH3;a2;b12(2+)', 'y5;y5-NH3;b12(2+)'],
>>> 'Mass Deviations [Da]': ['34666.4;2191.7;88570.6;2148.7;89073.6', '10544.1;36224.8;73327.7'],
>>> 'Mass Deviations [ppm]': ['0.008335659;-0.01799215;-0.002397317;-0.0004952438;-0.004926575',
>>>                             '0.009286639;-0.004650567;-0.002822918'],
>>> 'Masses': ['14.23987;-16.19888;-4.217963;-4.303209;-9.237617', '15.86446;-8.182414;-5.293158'],
>>> 'Number of Matches': [5, 3],
>>> 'Intensity coverage': [0.1016966, 0.1564349],
>>> 'Peak coverage': [0.04166667, 0.04477612],
>>> 'Neutral loss level': ['None', 'None'],
>>> 'ETD identification type': ['Unknown', 'Unknown'],
>>> 'Reverse': ['Unknown +', 'Unknown +'],
>>> 'All scores': ['7.913836;4.348669;4.097387', '4.758178;3.318045;2.968256'],
>>> 'All sequences': ['AAAAVVSGPKRGRKKP;GVVAKGALTPKLSPVVG;GVVPSLKPTLAGKAVVG',
>>>                     'AAAAVVSGPKRGRKKP;VMKLLRHDKLVQL;QEILRKILPLGELA'],
>>> 'All modified sequences': ['_AAAAVVSGPKRGRKKP_;_GVVAKGALTPKLSPVVG_;_GVVPSLKPTLAGKAVVG_',
>>>                             '_AAAAVVSGPKRGRKKP_;_VMKLLRHDKLVQL_;_QEILRKILPLGELA_'],
>>> 'id': [1378, 1379],
>>> 'Protein group IDs': ['42625', '42625'],
>>> 'Peptide ID': [533, 533],
>>> 'Mod. peptide ID': [537, 537],
>>> 'Evidence ID': [1075, 1076],
>>> 'Oxidation (M) site IDs': ['', '']})
>>> msms.to_csv("./tests/doctests/input/msms.txt",sep='\t',index=False)
>>> converted_results = pp.convert_search(input_path="./tests/doctests/input/", search_engine="maxquant")
>>> print(converted_results)
Previous Next

© Copyright 2026, Wilhelmlab at Technical University of Munich.

Built with Sphinx using a theme provided by Read the Docs.