oktoberfest.pp.load_spectra

oktoberfest.pp.load_spectra(filenames, parser='pyteomics', tims_meta_file=None)

Read spectra from a given file.

This function reads MS2 spectra from a given mzML or hdf file using a specified parser. The file ending is used to determine the correct parsing method.

Parameters:
  • filenames (Union[str, Path, list[Union[str, Path]]]) – Path(s) to files containing MS2 spectra. Filenames need to end in “.mzML” (case-insensitive). For timstof data, a single hdf5 path ending in “.hdf” (case-insensitive) needs to be provided. Multiple paths are not yet supported for timstof.

  • parser (str) – Name of the package to use for parsing the mzml file, can be “pyteomics” or “pymzml”. Only used for parsing of mzML files.

  • tims_meta_file (Union[str, Path, None]) – Optional path to timstof metadata file in internal format. This is only required when loading timstof spectra and used for summation of spectra.

Raises:
  • TypeError – if not all filenames are provided as str or Path objects.

  • ValueError – if the filename does not end in either “.hdf” or “.mzML” (case-insensitive)

  • AssertionError – if no tims_meta_file was provided when loading timsTOF hdf data

Return type:

DataFrame

Returns:

measured spectra with metadata.

Example:

>>> from oktoberfest import preprocessing as pp
>>> filecontent = '''<?xml version="1.0" encoding="UTF-8"?>
>>> <mzML xmlns="http://example" version="1.1.0">
>>>   <cvList count="2">
>>>     <cv id="MS" fullName="Mass Spectrometry Ontology" version="4.1.0" URI="https://example"/>
>>>     <cv id="UO" fullName="Unit Ontology" version="1.23" URI="http://example"/>
>>>   </cvList>
>>>   <fileDescription>
>>>     <fileContent>
>>>       <cvParam cvRef="MS" accession="MS:1000579" name="MS1 spectrum"/>
>>>     </fileContent>
>>>   </fileDescription>
>>>   <referenceableParamGroupList count="1">
>>>     <referenceableParamGroup id="commonInstrumentParams">
>>>       <cvParam cvRef="MS" accession="MS:1000031" name="instrument model" value="Example Instrument"/>
>>>     </referenceableParamGroup>
>>>   </referenceableParamGroupList>
>>>   <run id="run1" defaultInstrumentConfigurationRef="IC1">
>>>     <spectrumList count="1">
>>>       <spectrum index="0" id="scan=1" defaultArrayLength="5">
>>>         <cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="1"/>
>>>         <binaryDataArrayList count="2">
>>>           <binaryDataArray encodedLength="20">
>>>             <cvParam cvRef="MS" accession="4" name="m/z" unitCvRef="MS" unitAccession="0" unitName="m/z"/>
>>>             <binary>...</binary>
>>>           </binaryDataArray>
>>>           <binaryDataArray encodedLength="20">
>>>             <cvParam cvRef="MS" accession="5" name="i" unitCvRef="MS" unitAccession="1" unitName="c"/>
>>>             <binary>...</binary>
>>>           </binaryDataArray>
>>>         </binaryDataArrayList>
>>>       </spectrum>
>>>     </spectrumList>
>>>   </run>
>>> </mzML>'''
>>> with open("./tests/doctests/input/File1.mzml","w+") as f:
>>>     f.writelines(filecontent)
>>> spectra = pp.load_spectra(filenames=["./tests/doctests/input/File1.mzml"], parser="pyteomics")
>>> print(spectra)