oktoberfest.pp.generate_metadata

oktoberfest.pp.generate_metadata(peptides, collision_energy, precursor_charge, fragmentation, nr_ox, instrument_type=None, proteins=None)

Create metadata about peptides for a spectral library.

This function generates a pandas DataFrame containing metadata for peptides in a spectral library. Each row in the DataFrame represents a unique combination of a peptide, collision energy, precursor charge, and fragmentation. If multiple collision energies, precursor charges, or fragmentations are provided, the function creates all possible combinations for each peptide. An optional protein list can be provided which has to have the same length as the number of peptides.

Parameters:
  • peptides (list[str]) – A list of peptides for which metadata is generated.

  • collision_energy (Union[int, list[int]]) – A list of collision energies corresponding to each peptide.

  • precursor_charge (Union[int, list[int]]) – A list of precursor charges corresponding to each peptide.

  • fragmentation (Union[str, list[str]]) – A list of fragmentation methods corresponding to each peptide.

  • nr_ox (int) – Maximal number of allowed oxidations.

  • instrument_type (Optional[str]) – The type of mass spectrometeter. Only required when predicting intensities with AlphaPept. Choose one of [“QE”, “LUMOS”, “TIMSTOF”, “SCIEXTOF”].

  • proteins (Optional[list[list[str]]]) – An optional list of proteins associated with each peptide. If provided, it must have the same length as the number of peptides.

Raises:

AssertionError – If the lengths of peptides and proteins is not the same.

Return type:

DataFrame

Returns:

A DataFrame containing metadata with the columns “modified_peptide”,”collision_energy”, “precursor_charge”,”fragmentation” and an optional “proteins” column.

Example:

>>> from oktoberfest import preprocessing as pp
>>> metadata = pp.generate_metadata(peptides=["AAACRFVQ","RMPCHKPYL"],
>>>                                             collision_energy=[30,35],
>>>                                             precursor_charge=[1,2],
>>>                                             fragmentation=["HCD","HCD"],
>>>                                             nr_ox=1)
>>> print(metadata)