oktoberfest.pp.load_spectra
- oktoberfest.pp.load_spectra(filenames, parser='pyteomics', tims_meta_file=None)
Read spectra from a given file.
This function reads MS2 spectra from a given mzML or hdf file using a specified parser. The file ending is used to determine the correct parsing method.
- Parameters:
filenames (
Union[str,Path,list[Union[str,Path]]]) – Path(s) to files containing MS2 spectra. Filenames need to end in “.mzML” (case-insensitive). For timstof data, a single hdf5 path ending in “.hdf” (case-insensitive) needs to be provided. Multiple paths are not yet supported for timstof.parser (
str) – Name of the package to use for parsing the mzml file, can be “pyteomics” or “pymzml”. Only used for parsing of mzML files.tims_meta_file (
Union[str,Path,None]) – Optional path to timstof metadata file in internal format. This is only required when loading timstof spectra and used for summation of spectra.
- Raises:
TypeError – if not all filenames are provided as str or Path objects.
ValueError – if the filename does not end in either “.hdf” or “.mzML” (case-insensitive)
AssertionError – if no tims_meta_file was provided when loading timsTOF hdf data
- Return type:
- Returns:
measured spectra with metadata.
- Example:
>>> from oktoberfest import preprocessing as pp >>> filecontent = '''<?xml version="1.0" encoding="UTF-8"?> >>> <mzML xmlns="http://example" version="1.1.0"> >>> <cvList count="2"> >>> <cv id="MS" fullName="Mass Spectrometry Ontology" version="4.1.0" URI="https://example"/> >>> <cv id="UO" fullName="Unit Ontology" version="1.23" URI="http://example"/> >>> </cvList> >>> <fileDescription> >>> <fileContent> >>> <cvParam cvRef="MS" accession="MS:1000579" name="MS1 spectrum"/> >>> </fileContent> >>> </fileDescription> >>> <referenceableParamGroupList count="1"> >>> <referenceableParamGroup id="commonInstrumentParams"> >>> <cvParam cvRef="MS" accession="MS:1000031" name="instrument model" value="Example Instrument"/> >>> </referenceableParamGroup> >>> </referenceableParamGroupList> >>> <run id="run1" defaultInstrumentConfigurationRef="IC1"> >>> <spectrumList count="1"> >>> <spectrum index="0" id="scan=1" defaultArrayLength="5"> >>> <cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="1"/> >>> <binaryDataArrayList count="2"> >>> <binaryDataArray encodedLength="20"> >>> <cvParam cvRef="MS" accession="4" name="m/z" unitCvRef="MS" unitAccession="0" unitName="m/z"/> >>> <binary>...</binary> >>> </binaryDataArray> >>> <binaryDataArray encodedLength="20"> >>> <cvParam cvRef="MS" accession="5" name="i" unitCvRef="MS" unitAccession="1" unitName="c"/> >>> <binary>...</binary> >>> </binaryDataArray> >>> </binaryDataArrayList> >>> </spectrum> >>> </spectrumList> >>> </run> >>> </mzML>''' >>> with open("./tests/doctests/input/File1.mzml","w+") as f: >>> f.writelines(filecontent) >>> spectra = pp.load_spectra(filenames=["./tests/doctests/input/File1.mzml"], parser="pyteomics") >>> print(spectra)