Oktoberfest: Rescoring and Spectral Library Generation for Proteomics.

API

Import Oktoberfest using

import oktoberfest as ok

Data: data

The data submodule provides access to PSMs, predictions, and metadata.

data.Spectra(*args, **kwargs)

Main to init spectra data.

Preprocessing: pp

Generating libraries

pp.digest(fasta, digestion, ...)

Digest a given fasta file with specific settings.

pp.generate_metadata(peptides, ...[, ...])

Create metadata about peptides for a spectral library.

pp.gen_lib(input_file)

Generate a spectral library from a given input.

pp.merge_spectra_and_peptides(spectra, search)

Merge peptides with spectra.

pp.annotate_spectral_library(psms[, ...])

Annotate all specified ion peaks of given PSMs (Default b and y ions).

Spectra preprocessing

pp.list_spectra(input_dir, input_format)

Return a list of all spectra files of a given format.

pp.convert_raw_to_mzml(raw_file, output_file)

Convert raw to mzML format.

pp.convert_d_to_hdf(d_dir, output_file)

Convert d to hdf format.

pp.load_spectra(filenames[, parser, ...])

Read spectra from a given file.

Peptide preprocessing

pp.convert_search(input_path, search_engine)

Convert search results to Oktoberfest format.

pp.load_search(input_file)

Load search results.

pp.split_search(search_results, output_dir)

Split search results by spectrum file.

pp.convert_timstof_metadata(input_path, ...)

Convert metadata for timstof to Oktoberfest format.

pp.split_timstof_metadata(timstof_metadata, ...)

Split timstof metadata by spectrum file.

pp.filter_peptides(peptides, min_length, ...)

Filter search results using given constraints.

pp.filter_peptides_for_model(peptides, model)

Filter search results to support a given peptide prediction model.

Predicting: pr

Access to functions that interface either a Koina server to retrieve predictions from various prediction models, or DLomix to serve & refinement-learn pre-trained models locally.

High-level prediction runner

pr.Predictor(predictor, model_name)

Abstracts common prediction operations away from their actual implementation via the DLomix or Koina interface.

Koina interface

pr.Koina(*args, **kwargs)

Extension of the Koina GRPC class in koinapy, to add required logic for Oktoberfest.

DLomix interface

pr.DLomix(model_type, model_path, ...[, ...])

A class for interacting with DLomix models locally for inference.

pr.create_dlomix_dataset(libraries, output_dir)

Transform one or multiple spectra into Parquet file that can be used by DLomix.

pr.refine_intensity_predictor(...[, ...])

Perform refinement/transfer learning on a baseline intensity predictor.

Rescoring: re

re.generate_features(library, search_type, ...)

Generate features to be used for percolator or mokapot target decoy separation.

re.merge_input(tab_files, output_file)

Merge spectra file identifier specific tab files into one large file for combined percolation.

re.rescore_with_mokapot(input_file[, ...])

Rescore using mokapot.

re.rescore_with_percolator(input_file[, ...])

Rescore using percolator.

Plotting: pl

pl.plot_score_distribution(target, decoy, ...)

Generate histogram of the score distribution for targets and decoys.

pl.joint_plot(prosit_target, prosit_decoy, ...)

Generate joint plot to compare rescoring with and without peptide property predictions.

pl.plot_gain_loss(prosit_target, ...)

Generate venn barplots to show lost, common and shared targets below 1% FDR attributed to peptide property predictions.

pl.plot_mean_sa_ce(sa_ce_df, filename)

Generate dotplot for spectral angle distribution over range of collision energies used for fragment intensity prediction.

pl.plot_violin_sa_ce(sa_ce_df, filename)

Generate violinplot for spectral angle distribution over range of collision energies used for fragment intensity prediction.

pl.plot_pred_rt_vs_irt(prosit_df, ...)

Generate scatterplot to compare predicted indexed retention time against (aligned) experimentally observed retention time.

pl.plot_sa_distribution(prosit_df, ...)

Generate spectral angle distribution for targets and decoys.

pl.plot_mirror_spectrum(spec_pred, mzml, ...)

Generate a mirror plot comparing an experimental and predicted MS/MS spectrum.

pl.plot_all(data_dir, config)

Generate all plots after a rescoring run.