oktoberfest.re.merge_input

oktoberfest.re.merge_input(tab_files, output_file)

Merge spectra file identifier specific tab files into one large file for combined percolation.

The function takes a list of tab files and concatenates them before writing a combined tab file back to the chosen output file location.

Fastest solution according to: https://stackoverflow.com/questions/44211461/what-is-the-fastest-way-to-combine-100-csv-files-with-headers-into-one

Parameters:
  • tab_files (list[Path]) – list of paths pointing to the individual tab files to be concatenated

  • output_file (str | Path) – path to the generated output tab file

Example:

>>> from oktoberfest import rescore as re
>>> from pathlib import Path
>>> import pandas as pd
>>> rescore_df1 = pd.DataFrame({'SpecId': ["F1-81-AAAAAAALQAK-2-5","F1-15-VGVFQHGK-3-2"],
>>>                           'Label': [1,0],
>>>                           'ScanNr': [81,15],
>>>                           'filename': ["F1","F1"],
>>>                           'CID': [0,0],
>>>                           'Charge1': [0,0],
>>>                           'Charge2': [1,0],
>>>                           'Charge3': [0,1],
>>>                           'Charge4': [0,0],
>>>                           'Charge5': [0,0],
>>>                           'Charge6': [0,0],
>>>                           'HCD': [1,1],
>>>                           'KR': [1,1],
>>>                           'Mass': [1402.18,1103.54],
>>>                           'spectral_angle': [0.71,0.23],
>>>                           'sequence_length': [11,8],
>>>                           'Peptide': ["_.AAAAAAALQAK._","_.VGVFQHGK._"],
>>>                           'Proteins': ["AAAAAAALQAK","VGVFQHGK"],
>>>                           'RT': [64.79,57.84],
>>>                           'iRT': [65.99,56.22],
>>>                           'pred_RT': [58.86,55.34]})
>>> rescore_df2 = pd.DataFrame({'SpecId': ["F2-13-AEAEQEKDQLR-1-11","F2-27-TGFLEQLK-2-7"],
>>>                           'Label': [1,0],
>>>                           'ScanNr': [13,27],
>>>                           'filename': ["F2","F2"],
>>>                           'CID': [0,0],
>>>                           'Charge1': [1,0],
>>>                           'Charge2': [0,1],
>>>                           'Charge3': [0,0],
>>>                           'Charge4': [0,0],
>>>                           'Charge5': [0,0],
>>>                           'Charge6': [0,0],
>>>                           'HCD': [1,1],
>>>                           'KR': [2,1],
>>>                           'Mass': [1202.43,1009.14],
>>>                           'spectral_angle': [0.55,0.12],
>>>                           'sequence_length': [11,8],
>>>                           'Peptide': ["_.AEAEQEKDQLR._","_.TGFLEQLK._"],
>>>                           'Proteins': ["AEAEQEKDQLR","TGFLEQLK"],
>>>                           'RT': [62.33,51.23],
>>>                           'iRT': [63.98,53.24],
>>>                           'pred_RT': [59.16,50.76]})
>>> rescore_df1.to_csv("./tests/doctests/input/rescore1.tab",sep='\t',index=False)
>>> rescore_df2.to_csv("./tests/doctests/input/rescore2.tab",sep='\t',index=False)
>>> tabfile1 = Path("./tests/doctests/input/rescore1.tab")
>>> tabfile2 = Path("./tests/doctests/input/rescore2.tab")
>>> filelist = [tabfile1,tabfile2]
>>> re.merge_input(tab_files=filelist, output_file="./tests/doctests/output/merged_rescore.tab")