oktoberfest.pr.create_dlomix_dataset

oktoberfest.pr.create_dlomix_dataset(libraries, output_dir, include_additional_columns=None, remove_decoys=False, search_engine_score_threshold=None, num_duplicates=None)

Transform one or multiple spectra into Parquet file that can be used by DLomix.

Processes spectral libraries into DLomix-compatible format and detects fragment ion types and peptide modifications

present in the dataset, then writes the dataset to output_dir as processed_dataset.parquet and the lists of ion types and modifications as ion_types.txt and modifications.txt.

Parameters:
  • libraries (list[Spectra]) – Spectral libraries to include

  • output_dir (Path) – Directory to save processed dataset to

  • include_additional_columns (Optional[list[str]]) – additional columns to keep in the dataset

  • remove_decoys (bool) – Whether to remove decoys from the dataset

  • search_engine_score_threshold (Optional[float]) – Search engine score cutoff for peptides included in output

  • num_duplicates (Optional[int]) – Number of (sequence, charge, collision energy) duplicates to keep in output

Return type:

tuple[Path, list[str], list[str]]

Returns:

  • path of saved Parquet file

  • a list of ion types in it

  • a list of modifications in it (in the form of modstring tokens from spectrum_fundamentals)