oktoberfest.re.generate_features
- oktoberfest.re.generate_features(library, search_type, output_file, additional_columns=None, all_features=False, xl=False, cms2=False, regression_method='spline', add_neutral_loss_features=False, remove_miss_cleavage_features=False, task='default', featured_ions=None)
Generate features to be used for percolator or mokapot target decoy separation.
The function calculates a range of metrics and features on the provided library for the chosen fdr estimation method, then writes the input tab file to the chosen output file.
- Parameters:
library (
Spectra) – the library to perform feature generation onsearch_type (
str) – One of “original” and “rescore”, which determines the generated featuresoutput_file (
str|Path) – the location to the generated tab file to be used for percolator / mokapotadditional_columns (
str|list|None) – additional columns supplied in the search results to be used as features (either a list or “all”)all_features (
bool) – whether to use all features or only the standard set TODOxl (
bool) – crosslinked or linear peptidecms2 (
bool) – cleavable or non-cleavable crosslinkerregression_method (
str) – The regression method to use for iRT alignmentadd_neutral_loss_features (
bool) – Flag to indicate whether to add neutral loss features to percolator or notremove_miss_cleavage_features (
bool) – Flag to indicate whether to remove miss cleavage features from percolator or nottask (
str) – Flag to indicate whether to use multifrag features or notfeatured_ions (
Optional[list]) – The ion series to use for calculating percolator features
- Example:
>>> from oktoberfest import rescore as re >>> from oktoberfest import predict as pr >>> from oktoberfest.data import Spectra, FragmentType >>> import pandas as pd >>> import numpy as np >>> # Required columns: RAW_FILE, MODIFIED_SEQUENCE, SEQUENCE, CALCULATED_MASS, SCAN_NUMBER, >>> # COLLISION_ENERGY, PRECURSOR_CHARGE, REVERSE and SCORE >>> meta_df = pd.DataFrame({"RAW_FILE": ["File1","File1"], >>> "MODIFIED_SEQUENCE": ["AAAC[UNIMOD:4]RFVQ","RM[UNIMOD:35]PC[UNIMOD:4]HKPYL"], >>> "SEQUENCE": ["AAACRFVQ","RMPCHKPYL"], >>> "CALCULATED_MASS": [1000,4000], >>> "SCAN_NUMBER": [1,2], >>> "COLLISION_ENERGY": [30,35], >>> "PRECURSOR_CHARGE": [1,2], >>> "FRAGMENTATION": ["HCD","HCD"], >>> "REVERSE": [False,False], >>> "SCORE": [0,0]}) >>> var = Spectra._gen_vars_df() >>> library = Spectra(obs=meta_df, var=var) >>> raw_intensities = np.random.rand(2,174) >>> mzs = np.random.rand(2,174)*1000 >>> annotation = np.array([var.index,var.index]) >>> library.add_intensities(raw_intensities, annotation, FragmentType.RAW) >>> library.add_mzs(mzs, FragmentType.MZ) >>> library.strings_to_categoricals() >>> intensity_predictor = pr.Predictor.from_koina( >>> model_name="Prosit_2020_intensity_HCD", >>> server_url="koina.wilhelmlab.org:443", >>> ssl=True, >>> intensity_predictor.predict_intensities(data=library) >>> re.generate_features(library=library, >>> search_type="original", >>> regression_method="spline", >>> output_file="./tests/doctests/output/original.tab")