There are several by-locus summary functions available for gtypes objects. Given some sample microsatellite data:
One can calculate the following summaries:
The number of alleles at each locus:
## locus num.alleles
## 1 D11t 12
## 2 EV37 22
## 3 EV94 15
## 4 Ttr11 9
The number of samples with missing data at each locus:
## locus num.missing
## 1 D11t 1
## 2 EV37 7
## 3 EV94 1
## 4 Ttr11 1
which can also be expressed as a proportion of samples with missing data:
## locus num.missing
## 1 D11t 0.004
## 2 EV37 0.028
## 3 EV94 0.004
## 4 Ttr11 0.004
The allelic richness, or the average number of alleles per sample:
## locus allelic.richness
## 1 D11t 0.096
## 2 EV37 0.185
## 3 EV94 0.120
## 4 Ttr11 0.072
The observed and expected heterozygosity:
## locus obsvd.het
## 1 D11t 0.70
## 2 EV37 0.66
## 3 EV94 0.77
## 4 Ttr11 0.70
## locus exptd.het
## 1 D11t 0.75
## 2 EV37 0.83
## 3 EV94 0.83
## 4 Ttr11 0.80
The proportion of alleles that are unique (present in only one sample):
## locus prop.unique.alleles
## 1 D11t 0.032
## 2 EV37 0.025
## 3 EV94 0.016
## 4 Ttr11 0.024
The value of theta based on heterozygosity:
## Registered S3 method overwritten by 'pegas':
## method from
## print.amova ade4
## locus theta
## 1 D11t 0.56
## 2 EV37 0.62
## 3 EV94 0.62
## 4 Ttr11 0.59
These measures are all calculated in the summarizeLoci function and returned as a matrix. This function also allows you to calculate the measures for each stratum separately, which returns a list for each stratum:
## locus num.genotyped num.missing prop.genotyped num.alleles allelic.richness
## 1 D11t 125 1 0.99 12 0.096
## 2 EV37 119 7 0.94 22 0.185
## 3 EV94 125 1 0.99 15 0.120
## 4 Ttr11 125 1 0.99 9 0.072
## prop.unique.alleles exptd.het obsvd.het
## 1 0.032 0.75 0.70
## 2 0.025 0.83 0.66
## 3 0.016 0.83 0.77
## 4 0.024 0.80 0.70
## locus stratum num.genotyped.x num.missing prop.genotyped num.alleles
## 1 D11t Offshore 58 0 1.00 12
## 2 EV37 Offshore 56 2 0.97 22
## 3 EV94 Offshore 57 1 0.98 15
## 4 Ttr11 Offshore 57 1 0.98 9
## 5 D11t Coastal 67 1 0.99 3
## 6 EV37 Coastal 63 5 0.93 7
## 7 EV94 Coastal 68 0 1.00 5
## 8 Ttr11 Coastal 68 0 1.00 4
## allelic.richness num.unique num.genotyped.y exptd.het obsvd.het
## 1 0.207 3 58 0.86 0.91
## 2 0.393 3 56 0.94 0.76
## 3 0.263 2 57 0.86 0.81
## 4 0.158 3 57 0.82 0.78
## 5 0.045 1 67 0.49 0.51
## 6 0.111 3 63 0.61 0.57
## 7 0.074 0 68 0.77 0.74
## 8 0.059 0 68 0.66 0.63
One can also obtain the allelic frequencies for each locus overall and by-strata by:
## $D11t
##
## 117 119 121 127 129 131 133 135 137 139 141 143
## 1 1 4 1 3 16 75 96 20 20 7 6
##
## $EV37
##
## 190 200 202 204 206 208 210 212 214 216 218 220 222 224 226 228 230 232 234 236
## 3 4 5 2 7 8 3 13 86 39 8 6 20 11 3 8 5 2 1 2
## 240 254
## 1 1
##
## $EV94
##
## 229 239 243 245 247 249 251 253 255 259 261 263 265 269 271
## 1 2 15 18 3 83 41 7 6 27 27 7 8 3 2
##
## $Ttr11
##
## 193 197 207 209 211 213 215 217 219
## 1 10 53 17 35 80 46 7 1
## $D11t
##
## Coastal Offshore
## 117 0 1
## 119 0 1
## 121 0 4
## 127 0 1
## 129 0 3
## 131 0 16
## 133 48 27
## 135 83 13
## 137 3 17
## 139 0 20
## 141 0 7
## 143 0 6
##
## $EV37
##
## Coastal Offshore
## 190 0 3
## 200 0 4
## 202 0 5
## 204 0 2
## 206 0 7
## 208 0 8
## 210 0 3
## 212 1 12
## 214 71 15
## 216 33 6
## 218 2 6
## 220 0 6
## 222 11 9
## 224 7 4
## 226 1 2
## 228 0 8
## 230 0 5
## 232 0 2
## 234 0 1
## 236 0 2
## 240 0 1
## 254 0 1
##
## $EV94
##
## Coastal Offshore
## 229 0 1
## 239 0 2
## 243 0 15
## 245 12 6
## 247 0 3
## 249 47 36
## 251 30 11
## 253 0 7
## 255 0 6
## 259 25 2
## 261 22 5
## 263 0 7
## 265 0 8
## 269 0 3
## 271 0 2
##
## $Ttr11
##
## Coastal Offshore
## 193 0 1
## 197 0 10
## 207 42 11
## 209 0 17
## 211 0 35
## 213 59 21
## 215 33 13
## 217 2 5
## 219 0 1
The dupGenotypes function identifies samples that have the same or nearly the same genotypes. The number (or percent) of loci that must be shared in order for it to be considered a duplicate can be set by the num.shared argument. The return data.frame provides which loci the two samples show mismatches at so they can be reviewed.
## ids.1 ids.2 strata.1 strata.2 mismatch.loci num.loci.genotyped
## 1 78045 78058 Coastal Coastal <NA> 4
## 2 25509 41822 Coastal Coastal <NA> 4
## 3 42193 78035 Coastal Coastal <NA> 3
## 4 41579 45237 Coastal Coastal <NA> 3
## 5 40916 78038 Coastal Coastal <NA> 3
## 6 78063 78069 Coastal Coastal EV94 4
## 7 78053 78061 Coastal Coastal EV94 4
## 8 78051 78065 Coastal Coastal EV94 4
## 9 78049 78057 Coastal Coastal EV94 4
## 10 78048 78054 Coastal Coastal D11t 4
## 11 78044 78067 Coastal Coastal D11t 4
## 12 78044 78063 Coastal Coastal EV37 4
## 13 78043 78046 Coastal Coastal EV94 4
## 14 78041 78066 Coastal Coastal EV37 4
## 15 78038 78051 Coastal Coastal D11t 4
## 16 78038 78046 Coastal Coastal EV94 4
## 17 78038 78043 Coastal Coastal EV94 4
## 18 78036 78067 Coastal Coastal EV37 4
## 19 78035 78053 Coastal Coastal D11t 4
## 20 78034 78043 Coastal Coastal EV37 4
## 21 78034 78040 Coastal Coastal EV94 4
## 22 45236 78065 Coastal Coastal Ttr11 4
## 23 45231 78041 Coastal Coastal EV94 4
## 24 45230 78040 Coastal Coastal Ttr11 4
## 25 45230 78035 Coastal Coastal EV94 4
## 26 44721 78059 Coastal Coastal EV37 4
## 27 44720 78058 Coastal Coastal EV94 4
## 28 44720 78045 Coastal Coastal EV94 4
## 29 44719 78044 Coastal Coastal Ttr11 4
## 30 44719 45233 Coastal Coastal EV94 4
## 31 44718 78037 Coastal Coastal EV94 4
## 32 41822 78065 Coastal Coastal EV94 4
## 33 41822 78051 Coastal Coastal EV94 4
## 34 41821 78060 Coastal Coastal EV94 4
## 35 41820 78035 Coastal Coastal Ttr11 4
## 36 41820 45229 Coastal Coastal EV94 4
## 37 41819 78040 Coastal Coastal Ttr11 4
## 38 41819 45230 Coastal Coastal Ttr11 4
## 39 41578 45233 Coastal Coastal EV94 4
## 40 41578 44719 Coastal Coastal EV94 4
## 41 41540 78040 Coastal Coastal D11t 4
## 42 41538 45231 Coastal Coastal D11t 4
## 43 40915 78047 Coastal Coastal Ttr11 4
## 44 25509 78065 Coastal Coastal EV94 4
## 45 25509 78051 Coastal Coastal EV94 4
## 46 25503 78053 Coastal Coastal Ttr11 4
## 47 25503 41539 Coastal Coastal EV94 4
## 48 23945 78065 Coastal Coastal EV37 4
## 49 23945 78050 Coastal Coastal Ttr11 4
## 50 51981 78069 Coastal Coastal EV37 3
## 51 45237 78069 Coastal Coastal Ttr11 3
## 52 45237 78068 Coastal Coastal EV94 3
## 53 45237 78048 Coastal Coastal EV94 3
## 54 45237 78038 Coastal Coastal Ttr11 3
## 55 45237 78033 Coastal Coastal EV94 3
## 56 42193 78058 Coastal Coastal EV94 3
## 57 42193 78053 Coastal Coastal D11t 3
## 58 42193 78045 Coastal Coastal EV94 3
## 59 42193 51982 Coastal Coastal Ttr11 3
## 60 42193 45233 Coastal Coastal EV94 3
## 61 42193 45230 Coastal Coastal EV94 3
## 62 42193 44720 Coastal Coastal EV94 3
## 63 42193 44719 Coastal Coastal EV94 3
## 64 42192 78066 Coastal Coastal EV94 3
## 65 42192 78051 Coastal Coastal D11t 3
## 66 42192 78041 Coastal Coastal EV94 3
## 67 42192 78038 Coastal Coastal D11t 3
## 68 42192 45231 Coastal Coastal EV94 3
## 69 41820 42193 Coastal Coastal Ttr11 3
## 70 41819 45237 Coastal Coastal EV94 3
## 71 41579 78069 Coastal Coastal Ttr11 3
## 72 41579 78068 Coastal Coastal EV94 3
## 73 41579 78048 Coastal Coastal EV94 3
## 74 41579 78038 Coastal Coastal Ttr11 3
## 75 41579 78033 Coastal Coastal EV94 3
## 76 41579 41819 Coastal Coastal EV94 3
## 77 41578 45237 Coastal Coastal Ttr11 3
## 78 41578 42193 Coastal Coastal EV94 3
## 79 41578 41579 Coastal Coastal Ttr11 3
## 80 40916 78069 Coastal Coastal Ttr11 3
## 81 40916 78051 Coastal Coastal D11t 3
## 82 40916 78046 Coastal Coastal EV94 3
## 83 40916 78043 Coastal Coastal EV94 3
## 84 40916 78040 Coastal Coastal EV94 3
## 85 40916 78034 Coastal Coastal EV94 3
## 86 40916 45237 Coastal Coastal Ttr11 3
## 87 40916 42192 Coastal Coastal D11t 3
## 88 40916 41579 Coastal Coastal Ttr11 3
## 89 40916 41578 Coastal Coastal Ttr11 3
## num.loci.shared prop.loci.shared
## 1 4 1.00
## 2 4 1.00
## 3 3 1.00
## 4 3 1.00
## 5 3 1.00
## 6 3 0.75
## 7 3 0.75
## 8 3 0.75
## 9 3 0.75
## 10 3 0.75
## 11 3 0.75
## 12 3 0.75
## 13 3 0.75
## 14 3 0.75
## 15 3 0.75
## 16 3 0.75
## 17 3 0.75
## 18 3 0.75
## 19 3 0.75
## 20 3 0.75
## 21 3 0.75
## 22 3 0.75
## 23 3 0.75
## 24 3 0.75
## 25 3 0.75
## 26 3 0.75
## 27 3 0.75
## 28 3 0.75
## 29 3 0.75
## 30 3 0.75
## 31 3 0.75
## 32 3 0.75
## 33 3 0.75
## 34 3 0.75
## 35 3 0.75
## 36 3 0.75
## 37 3 0.75
## 38 3 0.75
## 39 3 0.75
## 40 3 0.75
## 41 3 0.75
## 42 3 0.75
## 43 3 0.75
## 44 3 0.75
## 45 3 0.75
## 46 3 0.75
## 47 3 0.75
## 48 3 0.75
## 49 3 0.75
## 50 2 0.67
## 51 2 0.67
## 52 2 0.67
## 53 2 0.67
## 54 2 0.67
## 55 2 0.67
## 56 2 0.67
## 57 2 0.67
## 58 2 0.67
## 59 2 0.67
## 60 2 0.67
## 61 2 0.67
## 62 2 0.67
## 63 2 0.67
## 64 2 0.67
## 65 2 0.67
## 66 2 0.67
## 67 2 0.67
## 68 2 0.67
## 69 2 0.67
## 70 2 0.67
## 71 2 0.67
## 72 2 0.67
## 73 2 0.67
## 74 2 0.67
## 75 2 0.67
## 76 2 0.67
## 77 2 0.67
## 78 2 0.67
## 79 2 0.67
## 80 2 0.67
## 81 2 0.67
## 82 2 0.67
## 83 2 0.67
## 84 2 0.67
## 85 2 0.67
## 86 2 0.67
## 87 2 0.67
## 88 2 0.67
## 89 2 0.67
The start and end positions and number of N’s and indels can be generated with the summarizeSeqs function:
## start end length num.ns num.indels
## 4495 1 402 402 0 2
## 4496 1 402 402 0 2
## 4498 1 402 402 0 1
## 5814 1 402 402 0 2
## 5815 1 402 402 0 2
## 5816 1 402 402 0 2
Base frequencies can be generated with baseFreqs:
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## a 0 126 126 126 126 126 5 0 0 0 0 126 0 0 0
## c 0 0 0 0 0 0 0 0 126 0 0 0 0 0 0
## g 126 0 0 0 0 0 0 126 0 0 0 0 0 0 126
## t 0 0 0 0 0 0 0 0 0 126 126 0 126 126 0
## u 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## r 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## m 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## k 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## w 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## s 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## b 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## d 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## h 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## v 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## n 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## x 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## - 0 0 0 0 0 0 121 0 0 0 0 0 0 0 0
## . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##
## a c g t u r y m k w s b d
## 15179 11561 6501 17166 0 0 0 0 0 0 0 0 0
## h v n x - .
## 0 0 0 0 245 0
Sequences can be scanned for low-frequency substitutions with lowFreqSubs:
## id site base freq motif
## 1 23792 274 t 1 cctattgatcc
## 2 23794 287 g 1 cctccgttata
## 3 26304 274 a 1 cctataaatcc
## 4 26304 394 t 1 taccttgtggg
## 5 74962 57 a 1 taaaaataatt
## 6 74962 104 g 1 catacgcatgt
## 7 74962 392 t 1 catgctccgtg
## 8 74962 393 c 1 atgctccgtgg
Unusual sequences can be identified by plotting likelihoods based on pairwise distances:
## id mean.dist neg.log.lik delta.log.lik
## 1 Hap.32 13.7 110 26.3
## 2 Hap.22 12.7 104 20.7
## 3 Hap.06 13.1 102 19.2
## 4 Hap.02 6.3 99 16.0
## 5 Hap.15 7.4 98 14.8
## 6 Hap.29 11.8 97 13.4
## 7 Hap.10 7.8 94 11.1
## 8 Hap.30 8.9 93 9.4
## 9 Hap.23 9.7 93 9.2
## 10 Hap.03 7.1 92 8.9
## 11 Hap.04 8.1 92 8.8
## 12 Hap.33 7.9 92 8.8
## 13 Hap.31 7.2 91 8.1
## 14 Hap.14 8.2 91 7.6
## 15 Hap.09 7.3 91 7.5
## 16 Hap.12 11.6 91 7.4
## 17 Hap.18 11.7 90 7.2
## 18 Hap.19 11.3 90 7.0
## 19 Hap.07 7.2 90 6.8
## 20 Hap.21 8.5 88 4.3
## 21 Hap.13 8.6 88 4.3
## 22 Hap.20 8.1 87 3.3
## 23 Hap.26 8.7 87 3.3
## 24 Hap.27 7.2 86 3.2
## 25 Hap.16 8.3 86 2.9
## 26 Hap.05 7.9 86 2.9
## 27 Hap.24 8.8 86 2.7
## 28 Hap.17 7.2 86 2.2
## 29 Hap.25 11.2 85 1.8
## 30 Hap.01 8.0 85 1.7
## 31 Hap.08 8.9 85 1.4
## 32 Hap.28 11.0 84 1.2
## 33 Hap.11 7.7 83 0.0
All of the above functions can be conducted at once with the qaqc function. Only those functions appropriate to the data type contained (haploid or diploid) will be run. Files are written for each output that are labelled either by the @description slot of the gtypes object or the optional label argument of the function.