oktoberfest.data.Spectra.preprocess_for_machine_learning
- Spectra.preprocess_for_machine_learning(include_intensities=True, include_additional_columns=None, ion_type_order=None, remove_decoys=False, search_engine_score_threshold=None, num_duplicates=None)
Filter and preprocess for machine learning applications and transform into a Parquet-serializable dataframe.
- Parameters:
include_intensities (
bool) – Whether to include intensity (label) columninclude_additional_columns (
Optional[list[str]]) – Additional column names that are not required by DLomix to include in output. Capitalization does not matter - internal column names are all uppercase, whereas returned column names are all lowercase.ion_type_order (
Optional[list[str]]) – Ion type order in which to save output intensity values.remove_decoys (
bool) – Whether to remove decoyssearch_engine_score_threshold (
Optional[float]) – Search engine score cutoff for peptides included in outputnum_duplicates (
Optional[int]) – Number of (sequence, charge, collision energy) duplicates to keep in output
- Return type:
- Returns:
Pandas DataFrame with column names and dtypes corresponding to those required by DLomix - modified_sequence (str) - precursor_charge_onehot (list[int]) - collision_energy_aligned_normal (int) - method_nbr (int) [- intensities_raw (list[float]) (if include_intensities == True)] [additional columns (if specified via include_additional_columns)]