oktoberfest.pr.create_dlomix_dataset
- oktoberfest.pr.create_dlomix_dataset(libraries, output_dir, include_additional_columns=None, remove_decoys=False, search_engine_score_threshold=None, num_duplicates=None)
Transform one or multiple spectra into Parquet file that can be used by DLomix.
- Processes spectral libraries into DLomix-compatible format and detects fragment ion types and peptide modifications
present in the dataset, then writes the dataset to output_dir as processed_dataset.parquet and the lists of ion types and modifications as ion_types.txt and modifications.txt.
- Parameters:
output_dir (
Path) – Directory to save processed dataset toinclude_additional_columns (
Optional[list[str]]) – additional columns to keep in the datasetremove_decoys (
bool) – Whether to remove decoys from the datasetsearch_engine_score_threshold (
Optional[float]) – Search engine score cutoff for peptides included in outputnum_duplicates (
Optional[int]) – Number of (sequence, charge, collision energy) duplicates to keep in output
- Return type:
- Returns:
path of saved Parquet file
a list of ion types in it
a list of modifications in it (in the form of modstring tokens from spectrum_fundamentals)