oktoberfest.pp.list_spectra
- oktoberfest.pp.list_spectra(input_dir, input_format)
Return a list of all spectra files of a given format.
Given an input directory, the function searches all files containing spectra and returns a list of paths pointing to the files. Files are included if the extension matches the provided format (case-insensitive). In case the input directory is a file, the function will check if it matches the format and return it wrapped in a list. If the format is “d” and the input directory ends with “.d”, the function will return the input directory wrapped in a list.
- Parameters:
- Raises:
NotADirectoryError – if the specified input directory does not exist
ValueError – if the specified file format is not supported
AssertionError – if the provided input directory (d) does not match the provided format or if none of the files within the provided input directory (mzml, raw, hdf) match the provided format
- Return type:
- Returns:
A list of paths to all spectra files found in the given directory
- Example:
>>> from oktoberfest import preprocessing as pp >>> import os >>> # creating minimum viable example .mzml file >>> filecontent = '''<?xml version="1.0" encoding="UTF-8"?> >>> <mzML xmlns="http://example" version="1.1.0"> >>> <cvList count="2"> >>> <cv id="MS" fullName="Mass Spectrometry Ontology" version="4.1.0" URI="https://example"/> >>> <cv id="UO" fullName="Unit Ontology" version="1.23" URI="http://example"/> >>> </cvList> >>> <fileDescription> >>> <fileContent> >>> <cvParam cvRef="MS" accession="MS:1000579" name="MS1 spectrum"/> >>> </fileContent> >>> </fileDescription> >>> <referenceableParamGroupList count="1"> >>> <referenceableParamGroup id="commonInstrumentParams"> >>> <cvParam cvRef="MS" accession="MS:1000031" name="instrument model" value="Example Instrument"/> >>> </referenceableParamGroup> >>> </referenceableParamGroupList> >>> <run id="run1" defaultInstrumentConfigurationRef="IC1"> >>> <spectrumList count="1"> >>> <spectrum index="0" id="scan=1" defaultArrayLength="5"> >>> <cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="1"/> >>> <binaryDataArrayList count="2"> >>> <binaryDataArray encodedLength="20"> >>> <cvParam cvRef="MS" accession="4" name="m/z" unitCvRef="MS" unitAccession="0" unitName="m/z"/> >>> <binary>...</binary> >>> </binaryDataArray> >>> <binaryDataArray encodedLength="20"> >>> <cvParam cvRef="MS" accession="5" name="i" unitCvRef="MS" unitAccession="1" unitName="c"/> >>> <binary>...</binary> >>> </binaryDataArray> >>> </binaryDataArrayList> >>> </spectrum> >>> </spectrumList> >>> </run> >>> </mzML>''' >>> os.makedirs("./tests/doctests/input/spectra", exist_ok=True) >>> with open("./tests/doctests/input/spectra/File1.mzml","w+") as f: >>> f.writelines(filecontent) >>> paths = pp.list_spectra(input_dir="./tests/doctests/input/spectra/", input_format="mzml") >>> print(paths)