oktoberfest.pr.Predictor.predict_intensities

Predictor.predict_intensities(data, xl=False, chunk_idx=None, **kwargs)

Generate intensity predictions and add them to the provided data object.

This function takes a Spectra object containing information about PSMs and predicts intensities. The configuration of Koina/DLomix is set using the kwargs. The function either predicts everything at once by concatenating all prediction results into single numpy arrays, or returns a list of individual numpy arrays, following the indices provided by optionally provided chunks of the dataframe.

Parameters:
  • data (Spectra) – Spectra object containing the required data for prediction and to store the predictions in after retrieval from the server.

  • xl (bool) – crosslinked or linear peptide

  • chunk_idx (Optional[list[Index]]) – The chunked indices of the provided dataframe. This is required in some cases, e.g. if padding should be avoided when predicting peptides of different length. For alphapept, this is required as padding is only performed within one batch, leading to different sizes of arrays between individual prediction batches that cannot be concatenated.

  • kwargs – Additional keyword arguments forwarded to Koina/DLomix::predict

Example:

>>> from oktoberfest.data.spectra import Spectra
>>> from oktoberfest import predict as pr
>>> # Requiered columns: MODIFIED_SEQUENCE, COLLISION_ENERGY, PRECURSOR_CHARGE and FRAGMENTATION
>>> meta_df = pd.DataFrame({"MODIFIED_SEQUENCE": ["AAAC[UNIMOD:4]RFVQ","RM[UNIMOD:35]PC[UNIMOD:4]HKPYL"],
>>>                         "COLLISION_ENERGY": [30,35],
>>>                         "PRECURSOR_CHARGE": [1,2],
>>>                         "FRAGMENTATION": ["HCD","HCD"]})
>>> var = Spectra._gen_vars_df()
>>> library = Spectra(obs=meta_df, var=var)
>>> library.strings_to_categoricals()
>>> intensity_predictor = pr.Predictor.from_koina(
>>>                         model_name="Prosit_2020_intensity_HCD",
>>>                         server_url="koina.wilhelmlab.org:443",
>>>                         ssl=True,
>>>                         targets=["intensities", "annotation"])
>>> intensity_predictor.predict_intensities(data=library)
>>> print(library.layers["pred_int"])