oktoberfest.pp.split_timstof_metadata

oktoberfest.pp.split_timstof_metadata(timstof_metadata, output_dir, filenames=None)

Split timstof metadata by spectrum file.

Given a list of spectrum file names from which timstof metadata originate the provided timstof metadata are split and filename specific csv files are written to the provided output directory. The provided file names need to correspond to the spectrum file identifier in the “RAW_FILE” column of the provided timstof_metadata. The timstof metadata need to be provided in internal format #TODO provided documentation. If the list of file names is not provided, all spectrum file identifiers are considered, otherwise only the identifiers found in the list are taken into account for writing the individual csv files. The output file names follow the convention <filename>.timsmeta. If a file name is not found in the timstof metadata, it is ignored and a warning is printed. The function returns a list of file names for which timstof metadata are available, removing the ones that were ignored if a list of file names was provided.

Parameters:
  • timstof_metadata (DataFrame) – timstof metadata in internal format

  • output_dir (Union[str, Path]) – directory in which to store individual csv files containing the timstof metadata for individual filenames

  • filenames (Optional[list[str]]) – optional list of spectrum filenames that should be considered. If not provided, all spectrum file identifiers in the timstof metadata are considered.

Return type:

list[str]

Returns:

list of file names for which timstof metadata could be found

Example:

>>> from oktoberfest import preprocessing as pp
>>> import pandas as pd
>>> timstod_meta = pd.DataFrame({"RAW_FILE": ["220331_NHG_malignant_CLL_02_Tue39L243_17%_DDA_Rep3",
>>>                                 "220331_NHG_malignant_CLL_02_Tue39L243_17%_DDA_Rep4"],
>>>                                 "FRAME": [2733,2824],
>>>                                 "PRECURSOR": [2195,2299],
>>>                                 "SCAN_NUM_BEGIN": [1416,1488],
>>>                                 "SCAN_NUM_END": [1439,1511],
>>>                                 "COLLISION_ENERGY": [26.25,24.57],
>>>                                 "SCAN_NUMBER": [8646,4879]})
>>> pp.split_timstof_metadata(timstof_metadata=timstod_meta, output_dir="./tests/doctests/output/")