Introduction

The aaSEA package provides an unique and comprehensive tool kit for analyzing amino acid substitutions and consequences on physico chemical property changes. The amino acid properties included are extensively used in protein engineering.

An example workflow

The typical work flow begins with “fasta”" formatted multiple sequence alignment (MSA). For convenience an example alignment file is provided and it can be read into R environment as follows.

Assessment of single vs Multiple substitutions and associated property changes

myMSA <- system.file("extdata", "linB_Prot_ali.fasta", package = "aaSEA") 
myMSA
## [1] "C:/Users/sai/AppData/Local/Temp/RtmpMLF9b7/Rinst3120559011ad/aaSEA/extdata/linB_Prot_ali.fasta"

The alignment file location should be passed to “getAASub” function. This function will compute sites with single substitutions and multiple substitutions and returns a list of three data frames namely “SingleSub”, “multiSub” and “All”

sub <- getAASub(myMSA)
str(sub)
## List of 3
##  $ singleSub:'data.frame':   96 obs. of  4 variables:
##   ..$ substitution: Factor w/ 115 levels "-297G","-298A",..: 101 50 69 87 51 66 67 46 55 68 ...
##   ..$ wt          : chr [1:96] "S" "G" "K" "P" ...
##   ..$ mu          : chr [1:96] "I" "S" "N" "A" ...
##   ..$ site        : chr [1:96] "2" "4" "6" "7" ...
##  $ multiSub :'data.frame':   19 obs. of  4 variables:
##   ..$ substitution: Factor w/ 115 levels "-297G","-298A",..: 21 20 110 111 102 103 70 71 25 27 ...
##   ..$ wt          : chr [1:19] "A" "A" "V" "V" ...
##   ..$ mu          : chr [1:19] "T" "N" "I" "L" ...
##   ..$ site        : chr [1:19] "81" "81" "134" "134" ...
##  $ All      :'data.frame':   115 obs. of  4 variables:
##   ..$ substitution: Factor w/ 115 levels "-297G","-298A",..: 101 50 69 87 51 66 67 46 55 68 ...
##   ..$ wt          : chr [1:115] "S" "G" "K" "P" ...
##   ..$ mu          : chr [1:115] "I" "S" "N" "A" ...
##   ..$ site        : chr [1:115] "2" "4" "6" "7" ...

“singleSub” file is a data frame with sites where single amino acid substitutions occurred. This file has four columns namely “substitution”, “wt”, “mu” and “site”

head(sub$singleSub)
##   substitution wt mu site
## 1          S2I  S  I    2
## 2          G4S  G  S    4
## 3          K6N  K  N    6
## 4          P7A  P  A    7
## 5          G9M  G  M    9
## 6         K11L  K  L   11

“multiSub” file is a data frame with sites where more than one amino acid substitutions occurred. This file has four columns namely “substitution”, “wt”, “mu” and “site”

head(sub$multiSub)
##    substitution wt mu site
## 23         A81T  A  T   81
## 24         A81N  A  N   81
## 37        V134I  V  I  134
## 38        V134L  V  L  134
## 39        T135A  T  A  135
## 40        T135V  T  V  135

Now calculate desired property changes. To compute the changes in desired properties, aaSEA offers three groups of physico checmial properties namely “Cruciani”, “Fasgai”, “Kidera” or all the properties listed in “AAindex”. Once the property is decided we use “getPropChange”" function. this function takes one of the two (single or multi) substitution files obtained, property data frame i.e. Cruciani/Fasgai/Kidera/AAindex and the row number of the property in the data frame.

  • Cruciani data frame has :
      1. Polarity
      1. Hydrophobicity
      1. H-bonding
  • Fasgai data frame has :
      1. Hydrophobicity index
      1. Alpha and turn propensities
      1. Bulky properties
      1. Compositional characteristic index
      1. Local flexibility
      1. Electronic properties
  • Kidera data frame has :
      1. Helix/bend preference
      1. Side-chain size
      1. Extended structure preference
      1. Hydrophobicity
      1. Double-bend preference
      1. Partial specific volume
      1. Flat extended preference
      1. Occurrence in alpha region
      1. pK-C
      1. Surrounding hydrophobicity
  • AAIndex data frame has standard 544 AAindex entries.
propChanges <- getPropChange(subFile = sub$singleSub, propertyDF = "Cruciani", propertyIndex = 1)
head(propChanges)
##   substitution wt mu site wt.Prop mu.Prop Delta.Prop
## 1          S2I  S  I    2    0.41   -0.94      -1.35
## 2          G4S  G  S    4   -0.88    0.41       1.29
## 3          K6N  K  N    6    0.60    0.82       0.22
## 4          P7A  P  A    7   -0.81   -0.96      -0.15
## 5          G9M  G  M    9   -0.88   -0.82       0.06
## 6         K11L  K  L   11    0.60   -0.90      -1.50

These property changes could be visualized as sorted bar graphs using “plotSingleSubChange”

plotSingleSubChange(singleSubChangeDF = propChanges)

Similarly we can compute property changes for each of individual substitutions where there are multiple substitutions are reported. a Heat map will be of great tool to visualize these results.

multiPropChange <- getPropChange(subFile = sub$multiSub, propertyDF = "cruciani", propertyIndex = 2)
plotMultiSubChange(multiSubChange = multiPropChange)

Assessment of correlated mutations and associated property changes

in aaSEA Correlated mutations could be computed by any of the four methods viz. MIP, McBASC, ELSC and OMES. Default method is McBASC.

corSites <- getCorSites(fileLoc = myMSA, corMethod = "mcbasc") # can use any of the "omes", "elsc", "mip"
## pos_i :  1 
## pos_i :  2 
## pos_i :  3 
## pos_i :  4 
## pos_i :  5 
## pos_i :  6 
## pos_i :  7 
## pos_i :  8 
## pos_i :  9 
## pos_i :  10 
## pos_i :  11 
## pos_i :  12 
## pos_i :  13 
## pos_i :  14 
## pos_i :  15 
## pos_i :  16 
## pos_i :  17 
## pos_i :  18 
## pos_i :  19 
## pos_i :  20 
## pos_i :  21 
## pos_i :  22 
## pos_i :  23 
## pos_i :  24 
## pos_i :  25 
## pos_i :  26 
## pos_i :  27 
## pos_i :  28 
## pos_i :  29 
## pos_i :  30 
## pos_i :  31 
## pos_i :  32 
## pos_i :  33 
## pos_i :  34 
## pos_i :  35 
## pos_i :  36 
## pos_i :  37 
## pos_i :  38 
## pos_i :  39 
## pos_i :  40 
## pos_i :  41 
## pos_i :  42 
## pos_i :  43 
## pos_i :  44 
## pos_i :  45 
## pos_i :  46 
## pos_i :  47 
## pos_i :  48 
## pos_i :  49 
## pos_i :  50 
## pos_i :  51 
## pos_i :  52 
## pos_i :  53 
## pos_i :  54 
## pos_i :  55 
## pos_i :  56 
## pos_i :  57 
## pos_i :  58 
## pos_i :  59 
## pos_i :  60 
## pos_i :  61 
## pos_i :  62 
## pos_i :  63 
## pos_i :  64 
## pos_i :  65 
## pos_i :  66 
## pos_i :  67 
## pos_i :  68 
## pos_i :  69 
## pos_i :  70 
## pos_i :  71 
## pos_i :  72 
## pos_i :  73 
## pos_i :  74 
## pos_i :  75 
## pos_i :  76 
## pos_i :  77 
## pos_i :  78 
## pos_i :  79 
## pos_i :  80 
## pos_i :  81 
## pos_i :  82 
## pos_i :  83 
## pos_i :  84 
## pos_i :  85 
## pos_i :  86 
## pos_i :  87 
## pos_i :  88 
## pos_i :  89 
## pos_i :  90 
## pos_i :  91 
## pos_i :  92 
## pos_i :  93 
## pos_i :  94 
## pos_i :  95 
## pos_i :  96 
## pos_i :  97 
## pos_i :  98 
## pos_i :  99 
## pos_i :  100 
## pos_i :  101 
## pos_i :  102 
## pos_i :  103 
## pos_i :  104 
## pos_i :  105 
## pos_i :  106 
## pos_i :  107 
## pos_i :  108 
## pos_i :  109 
## pos_i :  110 
## pos_i :  111 
## pos_i :  112 
## pos_i :  113 
## pos_i :  114 
## pos_i :  115 
## pos_i :  116 
## pos_i :  117 
## pos_i :  118 
## pos_i :  119 
## pos_i :  120 
## pos_i :  121 
## pos_i :  122 
## pos_i :  123 
## pos_i :  124 
## pos_i :  125 
## pos_i :  126 
## pos_i :  127 
## pos_i :  128 
## pos_i :  129 
## pos_i :  130 
## pos_i :  131 
## pos_i :  132 
## pos_i :  133 
## pos_i :  134 
## pos_i :  135 
## pos_i :  136 
## pos_i :  137 
## pos_i :  138 
## pos_i :  139 
## pos_i :  140 
## pos_i :  141 
## pos_i :  142 
## pos_i :  143 
## pos_i :  144 
## pos_i :  145 
## pos_i :  146 
## pos_i :  147 
## pos_i :  148 
## pos_i :  149 
## pos_i :  150 
## pos_i :  151 
## pos_i :  152 
## pos_i :  153 
## pos_i :  154 
## pos_i :  155 
## pos_i :  156 
## pos_i :  157 
## pos_i :  158 
## pos_i :  159 
## pos_i :  160 
## pos_i :  161 
## pos_i :  162 
## pos_i :  163 
## pos_i :  164 
## pos_i :  165 
## pos_i :  166 
## pos_i :  167 
## pos_i :  168 
## pos_i :  169 
## pos_i :  170 
## pos_i :  171 
## pos_i :  172 
## pos_i :  173 
## pos_i :  174 
## pos_i :  175 
## pos_i :  176 
## pos_i :  177 
## pos_i :  178 
## pos_i :  179 
## pos_i :  180 
## pos_i :  181 
## pos_i :  182 
## pos_i :  183 
## pos_i :  184 
## pos_i :  185 
## pos_i :  186 
## pos_i :  187 
## pos_i :  188 
## pos_i :  189 
## pos_i :  190 
## pos_i :  191 
## pos_i :  192 
## pos_i :  193 
## pos_i :  194 
## pos_i :  195 
## pos_i :  196 
## pos_i :  197 
## pos_i :  198 
## pos_i :  199 
## pos_i :  200 
## pos_i :  201 
## pos_i :  202 
## pos_i :  203 
## pos_i :  204 
## pos_i :  205 
## pos_i :  206 
## pos_i :  207 
## pos_i :  208 
## pos_i :  209 
## pos_i :  210 
## pos_i :  211 
## pos_i :  212 
## pos_i :  213 
## pos_i :  214 
## pos_i :  215 
## pos_i :  216 
## pos_i :  217 
## pos_i :  218 
## pos_i :  219 
## pos_i :  220 
## pos_i :  221 
## pos_i :  222 
## pos_i :  223 
## pos_i :  224 
## pos_i :  225 
## pos_i :  226 
## pos_i :  227 
## pos_i :  228 
## pos_i :  229 
## pos_i :  230 
## pos_i :  231 
## pos_i :  232 
## pos_i :  233 
## pos_i :  234 
## pos_i :  235 
## pos_i :  236 
## pos_i :  237 
## pos_i :  238 
## pos_i :  239 
## pos_i :  240 
## pos_i :  241 
## pos_i :  242 
## pos_i :  243 
## pos_i :  244 
## pos_i :  245 
## pos_i :  246 
## pos_i :  247 
## pos_i :  248 
## pos_i :  249 
## pos_i :  250 
## pos_i :  251 
## pos_i :  252 
## pos_i :  253 
## pos_i :  254 
## pos_i :  255 
## pos_i :  256 
## pos_i :  257 
## pos_i :  258 
## pos_i :  259 
## pos_i :  260 
## pos_i :  261 
## pos_i :  262 
## pos_i :  263 
## pos_i :  264 
## pos_i :  265 
## pos_i :  266 
## pos_i :  267 
## pos_i :  268 
## pos_i :  269 
## pos_i :  270 
## pos_i :  271 
## pos_i :  272 
## pos_i :  273 
## pos_i :  274 
## pos_i :  275 
## pos_i :  276 
## pos_i :  277 
## pos_i :  278 
## pos_i :  279 
## pos_i :  280 
## pos_i :  281 
## pos_i :  282 
## pos_i :  283 
## pos_i :  284 
## pos_i :  285 
## pos_i :  286 
## pos_i :  287 
## pos_i :  288 
## pos_i :  289 
## pos_i :  290 
## pos_i :  291 
## pos_i :  292 
## pos_i :  293 
## pos_i :  294 
## pos_i :  295 
## pos_i :  296
dim(corSites)
## [1]  38 121

At this step we have scanned through our multiple sequence alignment matrix and picked the sites that are having co-evolving sites as per the method of choice. Now from this selected columns we must get the wild type residue, residue position and substitution. this can be achieved by using “getTopSub”" function as shown below

corPairs <- getTopSub(corSites)
head(corPairs)
##    Pos1  Pos2
## 1 V112A V134I
## 2 V112A T135A
## 3 V112A L138I
## 4 V112A H247A
## 5 V112A I253M
## 6 V134I T135A

Now these co-evolving Pairs can be visualized by using simple network graph using “plotCorNet” function

  plotCorNet(corPairs)

Using these pairs, we could compute the property change at individual site of co-evolving pairs upon substitution. This will help us to find the magnitude and direction of change between co-evolving sites.

coPropChanges = getCorPropChange( corSubFile = corPairs, propertyDF = "Cruciani", propertyIndex = 1)
head(coPropChanges)
##    Pos1  Pos2 P1.wt P2.wt P1.mu P2.mu P1.wt.Prop P1.mu.Prop P2.wt.Prop
## 1 V112A V134I     V     V     A     I         -1      -0.96      -1.00
## 2 V112A T135A     V     T     A     A         -1      -0.96       0.40
## 3 V112A L138I     V     L     A     I         -1      -0.96      -0.90
## 4 V112A H247A     V     H     A     A         -1      -0.96       0.67
## 5 V112A I253M     V     I     A     M         -1      -0.96      -0.94
## 6 V134I T135A     V     T     I     A         -1      -0.94       0.40
##   P2.mu.Prop P1.Delta.Prop P2.Delta.Prop
## 1      -0.94          0.04          0.06
## 2      -0.96          0.04         -1.36
## 3      -0.94          0.04         -0.04
## 4      -0.96          0.04         -1.63
## 5      -0.82          0.04          0.12
## 6      -0.96          0.06         -1.36

Alternatively, we could find correlated substitutions pairs for a given property.

propCor <- getPropCorr(selMat = corSites, propertyDF = "curciani",propertyIndex = 1)
## Warning in sqrt(1 - h * h): NaNs produced
head(propCor)
##   Pos1 Pos2       cor p
## 1    1    2 0.9856103 0
## 2    1    3 1.0000000 0
## 3    2    3 0.9856103 0
## 4    1    4 0.9833572 0
## 5    2    4 0.9709619 0
## 6    3    4 0.9833572 0
plotCorSubChanges(propCor)