Add EMBOSS-inspired theoretical physicochemical properties using the Peptides library to an AnnotatedDataFrame

add_physicochemical_properties_to_HMMER_tbl(data, colname = "hits.fullfasta")

Arguments

data

A Data Frame containig the indicated column with protein sequences.

colname

A single character vector with the name of the column with the protein sequences.

This documentation is based on the following documentation: https://github.com/dosorio/Peptides/

Value

A Data Frame with new columns with the theoretical physicochemical properties

Theoretical properties

  • molecular.weight: sum of the masses of each atom constituting a molecule (Da) using the same formulas and weights as ExPASy's.

  • charge: net charge of a protein sequence based on the Henderson-Hasselbalch equation using Lehninger pKa scale.

  • pI: isoelectric point calculated as in EMBOSS PEPSTATS.

  • mz: mass over charge ratio (m/z) for peptides, as measured in mass spectrometry.

  • aIndex: aliphatic index of a protein. The aindex is defined as the relative volume occupied by aliphatic side chains (Alanine, Valine, Isoleucine, and Leucine). It may be regarded as a positive factor for the increase of thermostability of globular proteins.

  • boman: potential protein interaction index . The index is equal to the sum of the solubility values for all residues in a sequence, it might give an overall estimate of the potential of a peptide to bind to membranes or other proteins as receptors, to normalize it is divided by the number of residues. A protein have high binding potential if the index value is higher than 2.48.

  • hydrophobicity: GRAVY hydrophobicity index of an amino acids sequence using KyteDoolittle hydophobicity scale.

  • instaIndex: Guruprasad's instability index. This index predicts the stability of a protein based on its amino acid composition, a protein whose instability index is smaller than 40 is predicted as stable, a value above 40 predicts that the protein may be unstable.

  • STYNQW: Percentage of amino acids (S + T + Y + N + Q + W)

  • Tiny: Percentage of amino acids (A + C + G + S + T)

  • Small: Percentage of amino acids (A + B + C + D + G + N + P + S + T + V)

  • Aliphatic: Percentage of amino acids (A + I + L + V)

  • Aromatic: Percentage of amino acids (F + H + W + Y)

  • Nonpolar`: Percentage of amino acids (A + C + F + G + I + L + M + P + V + W + Y)

  • Polar: Percentage of amino acids (D + E + H + K + N + Q + R + S + T + Z)

  • Charged: Percentage of amino acids (B + D + E + H + K + R + Z)

  • Basic: Percentage of amino acids (H + K + R)

  • Acidic: Percentage of amino acids (B + D + E + Z)

Examples

data(phmmer_2abl)
add_physicochemical_properties_to_HMMER_tbl(
    data = phmmer_2abl,
    colname = "hits.fullfasta"
)
#> Warning: Sequence 1 has unrecognized amino acid types. Output value might be wrong calculated
#> Warning: Sequence 1 has unrecognized amino acid types. Output value might be wrong calculated
#> Warning: Sequence 1 has unrecognized amino acid types. Output value might be wrong calculated
#> Warning: Sequence 1 has unrecognized amino acid types. Output value might be wrong calculated
#> Warning: Sequence 1 has unrecognized amino acid types. Output value might be wrong calculated
#> Warning: Sequence 1 has unrecognized amino acid types. Output value might be wrong calculated
#> Warning: Sequence 1 has unrecognized amino acid types. Output value might be wrong calculated
#> Warning: Sequence 1 has unrecognized amino acid types. Output value might be wrong calculated
#> Warning: Sequence 1 has unrecognized amino acid types. Output value might be wrong calculated
#> # A tibble: 25 × 66
#>    algor…¹ uuid  stats…² stats…³ stats…⁴ stats.Z stats…⁵ stats…⁶ stats…⁷ stats…⁸
#>    <chr>   <chr>   <dbl>   <int> <chr>     <dbl>   <int>   <int>   <int>   <int>
#>  1 phmmer  DCF1…       1     545 0.11     565928       0   14388     545  565928
#>  2 phmmer  DCF1…       1     545 0.11     565928       0   14388     545  565928
#>  3 phmmer  DCF1…       1     545 0.11     565928       0   14388     545  565928
#>  4 phmmer  DCF1…       1     545 0.11     565928       0   14388     545  565928
#>  5 phmmer  DCF1…       1     545 0.11     565928       0   14388     545  565928
#>  6 phmmer  DCF1…       1     545 0.11     565928       0   14388     545  565928
#>  7 phmmer  DCF1…       1     545 0.11     565928       0   14388     545  565928
#>  8 phmmer  DCF1…       1     545 0.11     565928       0   14388     545  565928
#>  9 phmmer  DCF1…       1     545 0.11     565928       0   14388     545  565928
#> 10 phmmer  DCF1…       1     545 0.11     565928       0   14388     545  565928
#> # … with 15 more rows, 56 more variables: stats.user <dbl>,
#> #   stats.domZ_setby <int>, stats.n_past_bias <int>, stats.sys <dbl>,
#> #   stats.n_past_fwd <int>, stats.total <dbl>, stats.nmodels <int>,
#> #   stats.nincluded <int>, stats.n_past_vit <int>, stats.nreported <int>,
#> #   stats.domZ <dbl>, hits.archScore <chr>, hits.ph <chr>, hits.arch <chr>,
#> #   hits.kg <chr>, hits.ndom <int>, hits.extlink <chr>, hits.acc2 <chr>,
#> #   hits.taxid <chr>, hits.acc <chr>, hits.taxlink <chr>, hits.desc <chr>, …