oktoberfest.pr.Predictor.predict_in_chunks_xl

Predictor.predict_in_chunks_xl(data, chunk_idx, xl=False, **kwargs)

Retrieve and return predictions in chunks.

This function takes a Spectra object containing information about PSMs and predicts peptide properties.The configuration of Koina is set using the kwargs. See the Koina function for details. TODO, link this properly.

Parameters:
  • data (Spectra) – Spectra object containing the data for the prediction.

  • chunk_idx (list[Index]) – The chunked indices of the provided dataframe. This is required in some cases, e.g. if padding should be avoided when predicting peptides of different length. For alphapept, this is required as padding is only performed within one batch, leading to different sizes of arrays between individual prediction batches that cannot be concatenated.

  • xl (bool) – crosslinked or linear peptide

  • kwargs – Additional parameters that are forwarded to Koina

Return type:

tuple[dict[str, list[ndarray]], dict[str, list[ndarray]]]

Returns:

a dictionary with targets (keys) and list of predictions (values) with a length equal to the number of chunks provided.

Example:

>>> from oktoberfest import predict as pr
>>> from oktoberfest.utils import group_iterator
>>> # Required columns: MODIFIED_SEQUENCE, COLLISION_ENERGY, PRECURSOR_CHARGE, FRAGMENTATION and PEPTIDE_LENGTH
>>> meta_df = pd.DataFrame({"MODIFIED_SEQUENCE": ["AAAC[UNIMOD:4]RFVQ","RM[UNIMOD:35]PC[UNIMOD:4]HKPYL"],
>>>                         "COLLISION_ENERGY": [30,35],
>>>                         "PRECURSOR_CHARGE": [1,2],
>>>                         "FRAGMENTATION": ["HCD","HCD"],
>>>                         "PEPTIDE_LENGTH": [8,9]})
>>> var = Spectra._gen_vars_df()
>>> library = Spectra(obs=meta_df, var=var)
>>> idx = list(group_iterator(df=library.obs, group_by_column="PEPTIDE_LENGTH"))
>>> intensity_predictor = pr.Predictor.from_koina(
>>>                         model_name="Prosit_2020_intensity_HCD",
>>>                         server_url="koina.wilhelmlab.org:443",
>>>                         ssl=True)
>>> predictions = intensity_predictor.predict_in_chunks(data=library, chunk_idx=idx)
>>> print(predictions)