Calculate the percentage of pairwise sequence identity

pairwise_alignment_sequence_identity(
  seqs,
  aln_type = "global",
  pid_type = "PID1"
)

Arguments

seqs

A named character vector to convert into a Biostrings::AAStringSet or a Biostrings::AAStringSet with the sequences of interest. If they are not named, arbitrary names will be given.

aln_type

A character vector of one containing the alignment type. Possible options are "global" (Needleman-Wunsch),"local" (Smith-Waterman) and "overlap".

pid_type

A character vector of one containing the definition of percent sequence identity. Possible options are "PID1", "PID2", "PID3" and "PID4".

Value

A long DataFrame with the results.

Alignment types

  • global: align whole strings with end gap penalties (Needleman-Wunsch).

  • local: align string fragments (Smith-Waterman).

  • overlap: align whole strings without end gap penalties.

Percent sequence identity

  • PID1: 100 * (identical positions) / (aligned positions + internal gap positions).

  • PID2: 100 * (identical positions) / (aligned positions).

  • PID3: 100 * (identical positions) / (length shorter sequence).

  • PID4: 100 * (identical positions) / (average length of the two sequences).

Examples

data(phmmer_2abl)
pairwise_alignment_sequence_identity(
    seqs = phmmer_2abl$hits.fullfasta[6:10],
    aln_type = "overlap",
    pid_type = "PID2"
)
#> # A tibble: 20 × 3
#>    from       to           PID
#>    <chr>      <chr>      <dbl>
#>  1 .          .1         NaN  
#>  2 .          MYO1_NEUCR NaN  
#>  3 .          SRC_HUMAN  NaN  
#>  4 .          SRK2_SPOLA NaN  
#>  5 .1         MYO1_NEUCR NaN  
#>  6 .1         SRC_HUMAN  NaN  
#>  7 .1         SRK2_SPOLA NaN  
#>  8 MYO1_NEUCR SRC_HUMAN   17.9
#>  9 MYO1_NEUCR SRK2_SPOLA  15.1
#> 10 SRC_HUMAN  SRK2_SPOLA  62.2
#> 11 .1         .          NaN  
#> 12 MYO1_NEUCR .          NaN  
#> 13 SRC_HUMAN  .          NaN  
#> 14 SRK2_SPOLA .          NaN  
#> 15 MYO1_NEUCR .1         NaN  
#> 16 SRC_HUMAN  .1         NaN  
#> 17 SRK2_SPOLA .1         NaN  
#> 18 SRC_HUMAN  MYO1_NEUCR  17.9
#> 19 SRK2_SPOLA MYO1_NEUCR  15.1
#> 20 SRK2_SPOLA SRC_HUMAN   62.2