R/pairwise_alignment.R
pairwise_alignment_sequence_identity.Rd
Calculate the percentage of pairwise sequence identity
pairwise_alignment_sequence_identity(
seqs,
aln_type = "global",
pid_type = "PID1"
)
A named character vector to convert into a
Biostrings::AAStringSet
or a Biostrings::AAStringSet
with the
sequences of interest. If they are not named, arbitrary names
will be given.
A character vector of one containing the alignment type. Possible options are "global" (Needleman-Wunsch),"local" (Smith-Waterman) and "overlap".
A character vector of one containing the definition of percent sequence identity. Possible options are "PID1", "PID2", "PID3" and "PID4".
A long DataFrame with the results.
global
: align whole strings with end gap
penalties (Needleman-Wunsch).
local
: align string fragments (Smith-Waterman).
overlap
: align whole strings without end gap penalties.
PID1
: 100 * (identical positions) / (aligned positions + internal
gap positions).
PID2
: 100 * (identical positions) / (aligned positions).
PID3
: 100 * (identical positions) / (length shorter sequence).
PID4
: 100 * (identical positions) / (average length of the two
sequences).
data(phmmer_2abl)
pairwise_alignment_sequence_identity(
seqs = phmmer_2abl$hits.fullfasta[6:10],
aln_type = "overlap",
pid_type = "PID2"
)
#> # A tibble: 20 × 3
#> from to PID
#> <chr> <chr> <dbl>
#> 1 . .1 NaN
#> 2 . MYO1_NEUCR NaN
#> 3 . SRC_HUMAN NaN
#> 4 . SRK2_SPOLA NaN
#> 5 .1 MYO1_NEUCR NaN
#> 6 .1 SRC_HUMAN NaN
#> 7 .1 SRK2_SPOLA NaN
#> 8 MYO1_NEUCR SRC_HUMAN 17.9
#> 9 MYO1_NEUCR SRK2_SPOLA 15.1
#> 10 SRC_HUMAN SRK2_SPOLA 62.2
#> 11 .1 . NaN
#> 12 MYO1_NEUCR . NaN
#> 13 SRC_HUMAN . NaN
#> 14 SRK2_SPOLA . NaN
#> 15 MYO1_NEUCR .1 NaN
#> 16 SRC_HUMAN .1 NaN
#> 17 SRK2_SPOLA .1 NaN
#> 18 SRC_HUMAN MYO1_NEUCR 17.9
#> 19 SRK2_SPOLA MYO1_NEUCR 15.1
#> 20 SRK2_SPOLA SRC_HUMAN 62.2