oktoberfest.pp.digest
- oktoberfest.pp.digest(fasta, digestion, missed_cleavages, db, enzyme, special_aas, min_length, max_length)
Digest a given fasta file with specific settings.
This function performs an in-silico digestion of a fasta file based on the provided settings. It returns a dictionary that maps peptides to the list of associated protein IDs.
- Parameters:
fasta (
Union[str,Path]) – Path to fasta file containing sequences to digestdigestion (
str) – The type of digestion, one of “full, “semi”, “none”missed_cleavages (
int) – The number of allowed miscleaveagesdb (
str) – The desired database to produce, can be target, decoy, or bothenzyme (
str) – The protease to use for digestion TODO list available proteasesspecial_aas (
str) – List of aas to be swapped with preceding aa in reverse sequences. This mimics the behaviour of MaxQuant when creating decoys.min_length (
int) – Minimal length of digested peptidesmax_length (
int) – Maximal length of digested peptides
- Return type:
- Returns:
A Dictionary that maps peptides (keys) to a list of protein IDs (values).
- Example:
>>> from oktoberfest import preprocessing as pp >>> peptides = [ >>> (">Peptide1 Example peptide 1", "MKTIIALSYIFCLVFAD"), >>> (">Peptide2 Example peptide 2", "GILGFVFRTLTVPS"), >>> (">Peptide3 Example peptide 3", "LLGATCMFV") >>> ] >>> with open("./tests/doctests/input/peptides.fasta", "w") as file: >>> for header, sequence in peptides: >>> file.write(f"{header}\n") >>> file.write(f"{sequence}\n") >>> digest_dict = pp.digest(fasta="./tests/doctests/input/peptides.fasta", >>> digestion="full", >>> missed_cleavages=2, >>> db="concat", >>> enzyme="trypsin", >>> special_aas="KR", >>> min_length=7, >>> max_length=60) >>> print(digest_dict)