Peptides
library to an AnnotatedDataFrame
R/add_physicochemical_properties_to_HMMER_tbl.R
add_physicochemical_properties_to_HMMER_tbl.Rd
Add EMBOSS-inspired theoretical physicochemical properties
using the Peptides
library to an AnnotatedDataFrame
add_physicochemical_properties_to_HMMER_tbl(data, colname = "hits.fullfasta")
A Data Frame containig the indicated column with protein sequences.
A single character vector with the name of the column with the protein sequences.
This documentation is based on the following documentation: https://github.com/dosorio/Peptides/
A Data Frame with new columns with the theoretical physicochemical properties
molecular.weight: sum of the masses of each atom constituting a molecule (Da) using the same formulas and weights as ExPASy's.
charge: net charge of a protein sequence based on the Henderson-Hasselbalch equation using Lehninger pKa scale.
pI: isoelectric point calculated as in EMBOSS PEPSTATS.
mz: mass over charge ratio (m/z) for peptides, as measured in mass spectrometry.
aIndex: aliphatic index of a protein. The aindex is defined as the relative volume occupied by aliphatic side chains (Alanine, Valine, Isoleucine, and Leucine). It may be regarded as a positive factor for the increase of thermostability of globular proteins.
boman: potential protein interaction index . The index is equal to the sum of the solubility values for all residues in a sequence, it might give an overall estimate of the potential of a peptide to bind to membranes or other proteins as receptors, to normalize it is divided by the number of residues. A protein have high binding potential if the index value is higher than 2.48.
hydrophobicity: GRAVY hydrophobicity index of an amino acids sequence using KyteDoolittle hydophobicity scale.
instaIndex: Guruprasad's instability index. This index predicts the stability of a protein based on its amino acid composition, a protein whose instability index is smaller than 40 is predicted as stable, a value above 40 predicts that the protein may be unstable.
STYNQW: Percentage of amino acids (S + T + Y + N + Q + W)
Tiny: Percentage of amino acids (A + C + G + S + T)
Small: Percentage of amino acids (A + B + C + D + G + N + P + S + T + V)
Aliphatic: Percentage of amino acids (A + I + L + V)
Aromatic: Percentage of amino acids (F + H + W + Y)
Nonpolar`: Percentage of amino acids (A + C + F + G + I + L + M + P + V + W + Y)
Polar: Percentage of amino acids (D + E + H + K + N + Q + R + S + T + Z)
Charged: Percentage of amino acids (B + D + E + H + K + R + Z)
Basic: Percentage of amino acids (H + K + R)
Acidic: Percentage of amino acids (B + D + E + Z)
data(phmmer_2abl)
add_physicochemical_properties_to_HMMER_tbl(
data = phmmer_2abl,
colname = "hits.fullfasta"
)
#> Warning: Sequence 1 has unrecognized amino acid types. Output value might be wrong calculated
#> Warning: Sequence 1 has unrecognized amino acid types. Output value might be wrong calculated
#> Warning: Sequence 1 has unrecognized amino acid types. Output value might be wrong calculated
#> Warning: Sequence 1 has unrecognized amino acid types. Output value might be wrong calculated
#> Warning: Sequence 1 has unrecognized amino acid types. Output value might be wrong calculated
#> Warning: Sequence 1 has unrecognized amino acid types. Output value might be wrong calculated
#> Warning: Sequence 1 has unrecognized amino acid types. Output value might be wrong calculated
#> Warning: Sequence 1 has unrecognized amino acid types. Output value might be wrong calculated
#> Warning: Sequence 1 has unrecognized amino acid types. Output value might be wrong calculated
#> # A tibble: 25 × 66
#> algor…¹ uuid stats…² stats…³ stats…⁴ stats.Z stats…⁵ stats…⁶ stats…⁷ stats…⁸
#> <chr> <chr> <dbl> <int> <chr> <dbl> <int> <int> <int> <int>
#> 1 phmmer DCF1… 1 545 0.11 565928 0 14388 545 565928
#> 2 phmmer DCF1… 1 545 0.11 565928 0 14388 545 565928
#> 3 phmmer DCF1… 1 545 0.11 565928 0 14388 545 565928
#> 4 phmmer DCF1… 1 545 0.11 565928 0 14388 545 565928
#> 5 phmmer DCF1… 1 545 0.11 565928 0 14388 545 565928
#> 6 phmmer DCF1… 1 545 0.11 565928 0 14388 545 565928
#> 7 phmmer DCF1… 1 545 0.11 565928 0 14388 545 565928
#> 8 phmmer DCF1… 1 545 0.11 565928 0 14388 545 565928
#> 9 phmmer DCF1… 1 545 0.11 565928 0 14388 545 565928
#> 10 phmmer DCF1… 1 545 0.11 565928 0 14388 545 565928
#> # … with 15 more rows, 56 more variables: stats.user <dbl>,
#> # stats.domZ_setby <int>, stats.n_past_bias <int>, stats.sys <dbl>,
#> # stats.n_past_fwd <int>, stats.total <dbl>, stats.nmodels <int>,
#> # stats.nincluded <int>, stats.n_past_vit <int>, stats.nreported <int>,
#> # stats.domZ <dbl>, hits.archScore <chr>, hits.ph <chr>, hits.arch <chr>,
#> # hits.kg <chr>, hits.ndom <int>, hits.extlink <chr>, hits.acc2 <chr>,
#> # hits.taxid <chr>, hits.acc <chr>, hits.taxlink <chr>, hits.desc <chr>, …