oktoberfest.re.merge_input
- oktoberfest.re.merge_input(tab_files, output_file)
Merge spectra file identifier specific tab files into one large file for combined percolation.
The function takes a list of tab files and concatenates them before writing a combined tab file back to the chosen output file location.
Fastest solution according to: https://stackoverflow.com/questions/44211461/what-is-the-fastest-way-to-combine-100-csv-files-with-headers-into-one
- Parameters:
- Example:
>>> from oktoberfest import rescore as re >>> from pathlib import Path >>> import pandas as pd >>> rescore_df1 = pd.DataFrame({'SpecId': ["F1-81-AAAAAAALQAK-2-5","F1-15-VGVFQHGK-3-2"], >>> 'Label': [1,0], >>> 'ScanNr': [81,15], >>> 'filename': ["F1","F1"], >>> 'CID': [0,0], >>> 'Charge1': [0,0], >>> 'Charge2': [1,0], >>> 'Charge3': [0,1], >>> 'Charge4': [0,0], >>> 'Charge5': [0,0], >>> 'Charge6': [0,0], >>> 'HCD': [1,1], >>> 'KR': [1,1], >>> 'Mass': [1402.18,1103.54], >>> 'spectral_angle': [0.71,0.23], >>> 'sequence_length': [11,8], >>> 'Peptide': ["_.AAAAAAALQAK._","_.VGVFQHGK._"], >>> 'Proteins': ["AAAAAAALQAK","VGVFQHGK"], >>> 'RT': [64.79,57.84], >>> 'iRT': [65.99,56.22], >>> 'pred_RT': [58.86,55.34]}) >>> rescore_df2 = pd.DataFrame({'SpecId': ["F2-13-AEAEQEKDQLR-1-11","F2-27-TGFLEQLK-2-7"], >>> 'Label': [1,0], >>> 'ScanNr': [13,27], >>> 'filename': ["F2","F2"], >>> 'CID': [0,0], >>> 'Charge1': [1,0], >>> 'Charge2': [0,1], >>> 'Charge3': [0,0], >>> 'Charge4': [0,0], >>> 'Charge5': [0,0], >>> 'Charge6': [0,0], >>> 'HCD': [1,1], >>> 'KR': [2,1], >>> 'Mass': [1202.43,1009.14], >>> 'spectral_angle': [0.55,0.12], >>> 'sequence_length': [11,8], >>> 'Peptide': ["_.AEAEQEKDQLR._","_.TGFLEQLK._"], >>> 'Proteins': ["AEAEQEKDQLR","TGFLEQLK"], >>> 'RT': [62.33,51.23], >>> 'iRT': [63.98,53.24], >>> 'pred_RT': [59.16,50.76]}) >>> rescore_df1.to_csv("./tests/doctests/input/rescore1.tab",sep='\t',index=False) >>> rescore_df2.to_csv("./tests/doctests/input/rescore2.tab",sep='\t',index=False) >>> tabfile1 = Path("./tests/doctests/input/rescore1.tab") >>> tabfile2 = Path("./tests/doctests/input/rescore2.tab") >>> filelist = [tabfile1,tabfile2] >>> re.merge_input(tab_files=filelist, output_file="./tests/doctests/output/merged_rescore.tab")