The tcR package WILL SOON BE ORPHANED AND REMOVED FROM CRAN.
A new package is available that is designed to replace tcR: immunarch – http://immunarch.com/
We will be happy to help you to move to the new package. Feel free to contact us: http://github.com/immunomind/immunarch
Sincerely, immunarch dev team and Vadim I. Nazarov, lead developer of tcR
The tcR package designed to help researchers in the immunology field to analyse T cell receptor (TCR
) and immunoglobulin (Ig
) repertoires. In this vignette, I will cover procedures for immune receptor repertoire analysis provided with the package.
Terms:
Clonotype: a group of T / B cell clones with equal CDR3 nucleotide sequences and equal Variable genes.
Cloneset / repertoire: a set of clonotypes. Represented as a data frame in which each row corresponds to a unique clonotype.
UMI: Unique Molecular Identifier (see this paper for details)
There are two datasets provided with the package - twins data and V(D)J recombination genes data.
twa.rda
, twb.rda
- two lists with 4 data frames in each list. Every data frame is a sample downsampled to the 10000 most abundant clonotypes of twins data (alpha and beta chains). Full data is available here:
Twins TCR data at Laboratory of Comparative and Functional Genomics
Explore the data:
# Load the package and load the data.
library(tcR)
data(twa) # "twa" - list of length 4
data(twb) # "twb" - list of length 4
# Explore the data.
head(twa[[1]])
head(twb[[1]])
Gene alphabets - character vectors with names of genes for TCR and Ig.
# Run help to see available alphabets.
?genealphabets
?genesegments
data(genesegments)
For the exploratory analysis of a single repertoire, use the RMarkdown report file at
"<path to the tcR package>/inst/library.report.Rmd"
Analysis in the file include statistics and visualisation of number of clones, clonotypes, in- and out-of-frame sequences, unique amino acid CDR3 sequences, V- and J-usage, most frequent k-mers, rarefaction analysis.
For the analysis of a group of repertoires (“cross-analysis”), use the RMarkdown report file at:
"<path to the tcR package>/inst/crossanalysis.report.Rmd}"
Analysis in this file include statistics and visualisation of number of shared clones and clonotypes, V-usage for individuals and groups, J-usage for individuals, Jensen-Shannon divergence among V-usages of repertoires and top-cross.
You need the knitr package installed in order to generate reports from default pipelines. In RStudio you can run a pipeline file as follows:
Run RStudio -> load the pipeline .Rmd files -> press the knitr button
Currently in tcR there are implemented parser for the next software:
MiTCR - parse.mitcr
;
MiTCR w/ UMIs - parse.mitcrbc
;
MiGEC - parse.migec
;
VDJtools - parse.vdjtools
;
ImmunoSEQ - parse.immunoseq
;
MiXCR - parse.mixcr
;
IMSEQ - parse.imseq
.
Also a general parser parse.cloneset
for a text table files is implemented. General wrapper for parsers is parse.file
. User can also parse a list of files or the entire folder. Run ?parse.folder
to see a help on parsing input files and a list of functions for parsing a specific input format.
# Parse file in "~/mitcr/immdata1.txt" as a MiTCR file.
immdata1 <- parse.file("~/mitcr_data/immdata1.txt", 'mitcr')
# equivalent to
immdata1.eq <- parse.mitcr("~/mitcr_data/immdata1.txt")
# Parse folder with MiGEC files.
immdata <- parse.folder("~/migec_data/", 'migec')
Clonesets represented in tcR as data frames with each row corresponding to the one nucleotide clonotype and with specific column names:
Any data frame with this columns and of this class is suitable for processing with the package, hence user can generate their own table files and load them for the further analysis using read.csv
, read.table
and other base
R functions. Please note that tcR internally expects all strings to be of class “character”, not “factor”. Therefore you should use R parsing functions with parameter stringsAsFactors=FALSE.
# No D genes is available here hence "" at "D.genes" and "-1" at positions.
str(twa[[1]])
## 'data.frame': 10000 obs. of 16 variables:
## $ Umi.count : logi NA NA NA NA NA NA ...
## $ Umi.proportion : logi NA NA NA NA NA NA ...
## $ Read.count : int 147158 58018 55223 41341 31525 30682 20913 16695 14757 13745 ...
## $ Read.proportion : num 0.0857 0.0338 0.0322 0.0241 0.0184 ...
## $ CDR3.nucleotide.sequence: chr "TGTGCTGTGATGGATAGCAACTATCAGTTAATCTGG" "TGTGCAGAGAAGTCTAGCAACACAGGCAAACTAATCTTT" "TGTGTGGTGAACTTTACTGGAGGCTTCAAAACTATCTTT" "TGTGCAGCAAGTATAGGGCTTGTATCTAACTTTGGAAATGAGAAATTAACCTTT" ...
## $ CDR3.amino.acid.sequence: chr "CAVMDSNYQLIW" "CAEKSSNTGKLIF" "CVVNFTGGFKTIF" "CAASIGLVSNFGNEKLTF" ...
## $ V.gene : chr "TRAV1-2" "TRAV5" "TRAV12-1" "TRAV13-1" ...
## $ J.gene : chr "TRAJ33" "TRAJ37" "TRAJ9" "TRAJ48" ...
## $ D.gene : chr "" "" "" "" ...
## $ V.end : int 9 9 11 12 9 8 4 12 9 12 ...
## $ J.start : int 10 12 14 22 19 9 15 13 15 14 ...
## $ D5.end : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
## $ D3.end : num -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
## $ VD.insertions : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
## $ DJ.insertions : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
## $ Total.insertions : int 0 2 2 9 9 0 10 0 5 1 ...
str(twb[[1]])
## 'data.frame': 10000 obs. of 16 variables:
## $ Umi.count : logi NA NA NA NA NA NA ...
## $ Umi.proportion : logi NA NA NA NA NA NA ...
## $ Read.count : int 81516 46158 32476 30356 27321 23760 22232 20968 20603 17429 ...
## $ Read.proportion : num 0.0578 0.0327 0.023 0.0215 0.0194 ...
## $ CDR3.nucleotide.sequence: chr "TGTGCCAGCAGCCAAGCTCTAGCGGGAGCAGATACGCAGTATTTT" "TGTGCCAGCAGCTTAGGCCCCAGGAACACCGGGGAGCTGTTTTTT" "TGTGCCAGCAGTTATGGAGGGGCGGCAGATACGCAGTATTTT" "TGCAGTGCTGGAGGGATTGAAACCTCCTACAATGAGCAGTTCTTC" ...
## $ CDR3.amino.acid.sequence: chr "CASSQALAGADTQYF" "CASSLGPRNTGELFF" "CASSYGGAADTQYF" "CSAGGIETSYNEQFF" ...
## $ V.gene : chr "TRBV4-2" "TRBV13" "TRBV12-4, TRBV12-3" "TRBV20-1" ...
## $ J.gene : chr "TRBJ2-3" "TRBJ2-2" "TRBJ2-3" "TRBJ2-1" ...
## $ D.gene : chr "TRBD2" "TRBD1, TRBD2" "TRBD2" "TRBD1, TRBD2" ...
## $ V.end : int 15 16 12 12 13 9 11 14 12 14 ...
## $ J.start : int 18 17 15 13 20 15 14 16 15 17 ...
## $ D5.end : int 27 20 20 15 23 21 19 18 17 21 ...
## $ D3.end : int 28 23 25 23 24 22 21 23 20 28 ...
## $ VD.insertions : int 2 0 2 0 6 5 2 1 2 2 ...
## $ DJ.insertions : int 0 2 4 7 0 0 1 4 2 6 ...
## $ Total.insertions : int 2 2 6 7 6 5 3 5 4 8 ...
For the exploratory analysis tcR provides various functions for computing descriptive statistics.
To get a general view of a subject’s repertoire (overall count of sequences, in- and out-of-frames numbers and proportions) use the cloneset.stats
function. It returns a summary
of counts of nucleotide and amino acid clonotypes, as well as summary of read counts:
cloneset.stats(twb)
## #Nucleotide clones #Aminoacid clonotypes %Aminoacid clonotypes
## Subj.A 10000 9850 0.9850
## Subj.B 10000 9838 0.9838
## Subj.C 10000 9775 0.9775
## Subj.D 10000 9872 0.9872
## #In-frames %In-frames #Out-of-frames %Out-of-frames Sum.reads Min.reads
## Subj.A 9622 0.9622 346 0.0346 1410263 22
## Subj.B 9564 0.9564 400 0.0400 2251408 20
## Subj.C 9791 0.9791 192 0.0192 969949 23
## Subj.D 9225 0.9225 712 0.0712 1419130 32
## 1st Qu.reads Median.reads Mean.reads 3rd Qu.reads Max.reads
## Subj.A 26 33 141.0263 57 81516
## Subj.B 24 31 225.1408 55 171230
## Subj.C 28 39 96.9949 68 104628
## Subj.D 37 48 141.9130 83 33588
For characterisation of a library use the repseq.stats
function:
repseq.stats(twb)
## Clones Sum.reads Reads.per.clone
## Subj.A 10000 1410263 141.03
## Subj.B 10000 2251408 225.14
## Subj.C 10000 969949 96.99
## Subj.D 10000 1419130 141.91
Function clonal.proportion
is used to get the number of most abundant by the count of reads clonotypes. E.g., compute number of clonotypes which fill up (approx.) the 25% from total repertoire’s “Read.count”:
# How many clonotypes fill up approximately
clonal.proportion(twb, 25) # the 25% of the sum of values in 'Read.count'?
## Clones Percentage Clonal.count.prop
## Subj.A 12 25.1 0.0012
## Subj.B 6 26.5 0.0006
## Subj.C 7 25.2 0.0007
## Subj.D 38 25.2 0.0038
To get a proportion of the most abundant clonotypes’ sum of reads to the overall number of reads in a repertoire, use top.proportion
, i.e. get
(\(\sum\) reads of top clonotypes)\(/\)(\(\sum\) reads for all clonotypes).
E.g., get a proportion of the top-10 clonotypes’ reads to the overall number of reads:
# What accounts a proportion of the top-10 clonotypes' reads
top.proportion(twb, 10) # to the overall number of reads?
## Subj.A Subj.B Subj.C Subj.D
## 0.2289069 0.3648699 0.2620158 0.1305398
vis.top.proportions(twb) # Plot this proportions.
Function tailbound.proportion
with two arguments .col and .bound gets subset of the given data frame with clonotypes which have column .col with value \(\leq\) .bound and computes the ratio of sums of count reads of such subset to the overall data frame. E.g., get proportion of sum of reads of sequences which has “Read.count” <= 100 to the overall number of reads:
# What is a proportion of sequences which
# have 'Read.count' <= 100 to the
tailbound.proportion(twb, 100) # overall number of reads?
## Subj.A Subj.B Subj.C Subj.D
## 0.8651 0.8641 0.8555 0.8020
Clonal space homeostasis is a useful statistics of how many space occupied by clonotypes with specific proportions.
# data(twb)
# Compute summary space of clones, that occupy
# [0, .05) and [.05, 1] proportion.
clonal.space.homeostasis(twb, c(Low = .05, High = 1))
## Low (0 < X <= 0.05) High (0.05 < X <= 1)
## Subj.A 0.9421980 0.05780198
## Subj.B 0.9239454 0.07605463
## Subj.C 0.8279270 0.17207296
## Subj.D 1.0000000 0.00000000
# Use default arguments:
clonal.space.homeostasis(twb[[1]])
## Rare (0 < X <= 1e-05) Small (1e-05 < X <= 1e-04)
## Sample 0 0.2589567
## Medium (1e-04 < X <= 0.001) Large (0.001 < X <= 0.01)
## Sample 0.2130291 0.2666893
## Hyperexpanded (0.01 < X <= 1)
## Sample 0.261325
twb.space <- clonal.space.homeostasis(twb)
vis.clonal.space(twb.space)
Functions for performing subsetting and counting number of in-frame and out-of-frame clonotypes are: count.inframes
, count.outframes
, get.inframes
, get.outframes
. Parameter .head for this functions is a parameter to the .head function, that applied to the input data frame or an input list of data frames before subsetting. Functions accept both data frames and list of data frames as parameters. E.g., get data frame with only in-frame sequences and count out-of-frame sequences in the first 5000 rows for this data frame:
imm.in <- get.inframes(twb) # Return all in-frame sequences from the 'twb'.
# Count the number of out-of-frame sequences
count.outframes(twb, 5000) # from the first 5000 sequences.
## Subj.A Subj.B Subj.C Subj.D
## 172 212 73 326
General functions with parameter stands for ‘all’ (all sequences), ‘in’ (only in-frame sequences) or ‘out’ (only out-of-frame sequences) are get.frames and count.frames:
imm.in <- get.frames(twb, 'in') # Similar to 'get.inframes(twb)'.
count.frames(twb[[1]], 'all') # Just return number of rows.
## [1] 10000
flag <- 'out'
count.frames(twb, flag, 5000) # Similar to 'count.outframes(twb, 5000)'.
## Subj.A Subj.B Subj.C Subj.D
## 172 212 73 326
For exact or fuzzy search of sequences the package employed a function find.clonotypes
. Input arguments for this function are a data frame or a list of data frames, targets (a character vector or data frame having one column with sequences and additional columns with, e.g., V genes), a value of which column or columns to return, a method to be used to compare sequences among each other (either “exact” for exact matching, “hamm” for matching sequences by Hamming distance (two sequences are matched if H \(\leq\) 1) or “lev” for matching sequences by Levenshtein distance (two sequences are matched if L \(\leq\) 1)), and column name from which sequences for matching are obtained. Sounds very complex, but in practice it’s very easy, therefore let’s go to examples.
Suppose we want to search for some CDR3 sequences in a number of repertoires:
cmv <- data.frame(CDR3.amino.acid.sequence = c('CASSSANYGYTF', 'CSVGRAQNEQFF', 'CASSLTGNTEAFF', 'CASSALGGAGTGELFF', 'CASSLIGVSSYNEQFF'),
V.genes = c('TRBV4-1', 'TRBV4-1', 'TRBV4-1', 'TRBV4-1', 'TRBV4-1'), stringsAsFactors = F)
cmv
## CDR3.amino.acid.sequence V.genes
## 1 CASSSANYGYTF TRBV4-1
## 2 CSVGRAQNEQFF TRBV4-1
## 3 CASSLTGNTEAFF TRBV4-1
## 4 CASSALGGAGTGELFF TRBV4-1
## 5 CASSLIGVSSYNEQFF TRBV4-1
We will search for them using all methods of matching (exact, hamming or levenshtein) and with and without matching by V-segment. Also, for the first case (exact matching and without V gene) we return “Total.insertions” column along with the “Read.count” column, and for the second case output will be a “Rank” - rank (generated by set.rank
) of a clone or a clonotype in a data frame.
twb <- set.rank(twb)
# Case 1.
cmv.imm.ex <-
find.clonotypes(.data = twb[1:2], .targets = cmv[,1], .method = 'exact',
.col.name = c('Read.count', 'Total.insertions'),
.verbose = F)
head(cmv.imm.ex)
## CDR3.amino.acid.sequence Read.count.Subj.A Read.count.Subj.B
## CASSALGGAGTGELFF CASSALGGAGTGELFF 153 319
## CASSALGGAGTGELFF.1 CASSALGGAGTGELFF NA 35
## CASSLTGNTEAFF CASSLTGNTEAFF 35 263
## CASSLTGNTEAFF.1 CASSLTGNTEAFF 35 35
## CASSLTGNTEAFF.2 CASSLTGNTEAFF NA 28
## CASSSANYGYTF CASSSANYGYTF NA 15320
## Total.insertions.Subj.A Total.insertions.Subj.B
## CASSALGGAGTGELFF 9 10
## CASSALGGAGTGELFF.1 NA 9
## CASSLTGNTEAFF 2 2
## CASSLTGNTEAFF.1 1 0
## CASSLTGNTEAFF.2 NA 1
## CASSSANYGYTF NA 1
# Case 2.
# Search for CDR3 sequences with hamming distance <= 1
# to the one of the cmv$CDR3.amino.acid.sequence with
# matching V genes. Return ranks of found sequences.
cmv.imm.hamm.v <-
find.clonotypes(twb[1:3], cmv, 'hamm', 'Rank',
.target.col = c('CDR3.amino.acid.sequence',
'V.gene'),
.verbose = F)
head(cmv.imm.hamm.v)
## CDR3.amino.acid.sequence V.gene Rank.Subj.A Rank.Subj.B
## CASSALGGAGTGELFF CASSALGGAGTGELFF TRBV4-1 NA NA
## CASSLIGVSSYNEQFF CASSLIGVSSYNEQFF TRBV4-1 NA NA
## CASSLTGNTEAFF CASSLTGNTEAFF TRBV4-1 NA NA
## CASSSANYGYTF CASSSANYGYTF TRBV4-1 NA NA
## CSVGRAQNEQFF CSVGRAQNEQFF TRBV4-1 NA NA
## Rank.Subj.C
## CASSALGGAGTGELFF NA
## CASSLIGVSSYNEQFF NA
## CASSLTGNTEAFF NA
## CASSSANYGYTF NA
## CSVGRAQNEQFF NA
# Case 3.
# Similar to the previous example, except
# using levenshtein distance and the "Read.count" column.
cmv.imm.lev.v <-
find.clonotypes(twb[1:3], cmv, 'lev',
.target.col = c('CDR3.amino.acid.sequence', 'V.gene'),
.verbose = F)
head(cmv.imm.lev.v)
## CDR3.amino.acid.sequence V.gene Read.count.Subj.A
## CASSALGGAGTGELFF CASSALGGAGTGELFF TRBV4-1 NA
## CASSLIGVSSYNEQFF CASSLIGVSSYNEQFF TRBV4-1 NA
## CASSLTGNTEAFF CASSLTGNTEAFF TRBV4-1 NA
## CASSSANYGYTF CASSSANYGYTF TRBV4-1 NA
## CSVGRAQNEQFF CSVGRAQNEQFF TRBV4-1 NA
## Read.count.Subj.B Read.count.Subj.C
## CASSALGGAGTGELFF NA NA
## CASSLIGVSSYNEQFF NA NA
## CASSLTGNTEAFF NA NA
## CASSSANYGYTF NA NA
## CSVGRAQNEQFF NA NA
Variable and Joining gene usage (V-usage and J-usage) are important characteristics of repertoires. To access and compare them among repertoires tcR provides a few useful functions.
To access V- and J-usage of a repertoire tcR provides functions geneUsage
. Function geneUsage
, depending on parameters, computes frequencies or counts of the given elements (e.g., V genes) of the input data frame or the input list of data frames. V and J gene names for humans for TCR and Ig are stored in the .rda file genesegments.rda
(they are identical to those form IMGT: and ). All of the mentioned functions are accept data frames as well as list of data frames. Output for those functions are data frames with the first column stands for a gene and the other for frequencies.
imm1.vs <- geneUsage(twb[[1]], HUMAN_TRBV)
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
head(imm1.vs)
## Gene Sample
## 2 TRBV10-1 40
## 3 TRBV10-2 48
## 4 TRBV10-3 308
## 5 TRBV11-1 43
## 6 TRBV11-2 186
## 7 TRBV11-3 20
imm.vs.all <- geneUsage(twb, HUMAN_TRBV)
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
imm.vs.all[1:10, 1:4]
## Gene Subj.A Subj.B Subj.C
## 2 TRBV10-1 40 35 9
## 3 TRBV10-2 48 65 22
## 4 TRBV10-3 308 307 328
## 5 TRBV11-1 43 34 33
## 6 TRBV11-2 186 230 223
## 7 TRBV11-3 20 23 27
## 8 TRBV12-3 0 0 0
## 9 TRBV12-4 0 0 0
## 10 TRBV12-5 15 22 37
## 11 TRBV13 68 39 44
You can also directly visualise gene usage with the function vis.gene.usage
(if you pass the gene alphabet as a second argument):
# Put ".dodge = F" to get distinct plot for every data frame in the given list.
vis.gene.usage(twb, HUMAN_TRBJ, .main = 'twb J-usage dodge', .dodge = T)
vis.gene.usage(twb, HUMAN_TRBJ, .main = 'twb J-usage column', .dodge = F, .ncol = 2)
vis.gene.usage(imm1.vs, NA, .main = 'twb[[1]] V-usage', .coord.flip = F)
To evaluate V- and J genes usage of repertoires, the package implements subroutines for two approaches to the analysis: measures from the information theory and PCA (Principal Component Analysis).
To assess the diversity of genes usage user can use the entropy
function. Kullback-Leibler assymetric measure (function kl.div
) and Jensen-Shannon symmetric measure (functions js.div
for computing JS-divergence between the given distributions and js.div.seg
for computing JS-divergence between genes distributions of two clonesets or a list with data frames) are provided to estimate distance among gene usage of different repertoires. To visualise distances tcR employed the vis.radarlike
function, see Section “Plots” for more detailed information.
# Transform "0:100" to distribution with Laplace correction
entropy(0:100, .laplace = 1) # (i.e., add "1" to every value before transformation).
## [1] 6.399261
entropy.seg(twb, HUMAN_TRBV) # Compute entropy of V-segment usage for each data frame.
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## Subj.A Subj.B Subj.C Subj.D
## 4.684783 4.751303 4.591658 4.539259
js.div.seg(twb[1:2], HUMAN_TRBV, .verbose = F)
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## [1] 0.0008587249
imm.js <- js.div.seg(twb, HUMAN_TRBV, .verbose = F)
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
vis.radarlike(imm.js, .ncol = 2)
Principal component analysis (PCA) is a statistical procedure for transforming a set of observations to a set of special values for analysis. In tcR implemented functions pca.segments
for performing PCA on V- or J-usage, and pca.segments.2D
for performing PCA on VJ-usage. For plotting the PCA results see the vis.pca
function.
pca.segments(twb, .genes = HUMAN_TRBV) # Plot PCA results of V-segment usage.
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## Warning: In prcomp.default(t(as.matrix(.data)), ...) :
## extra argument '.genes' will be disregarded
# Return object of class "prcomp"
class(pca.segments(twb, .do.plot = F, .genes = HUMAN_TRBV))
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## Warning: In prcomp.default(t(as.matrix(.data)), ...) :
## extra argument '.genes' will be disregarded
## [1] "prcomp"
tcR provides a number of functions for evaluating similarity of clonesets based on shared among clonesets clonotypes and working with data frames with shared clonotypes.
The general interface to all functions for computing cloneset overlap coefficients is the repOverlap
function.
Number of shared clonotypes among the most abundant clonotypes may differ signigicantly from those with lesses count. To support research tcR offers the top.cross
function, that apply tcR::intersectClonesets
, e.g., to the first 1000 clonotypes, 2000, 3000 and so on up to the first 100000 clones, if supplied .n == seq(1000, 100000, 1000)
.
twb.top <- top.cross(.data = twb, .n = seq(500, 10000, 500), .verbose = F, .norm = T)
top.cross.plot(twb.top)
tcR also provides more complex similarity measures for evaluating the similarity of sets.
Tversky index (repOverlap(your_data, 'tversky')
for clonesets or tversky.index
for vectors) is an asymmetric similarity measure on sets that compares a variant to a prototype. If using default arguments, it’s similar to Dice’s coefficient.
Overlap coefficient (repOverlap(your_data, 'overlap')
for clonesets or overlap.coef
for vectors) is a similarity measure that measures the overlap between two sets, and is defined as the size of the intersection divided by the smaller of the size of the two sets.
Jaccard index (repOverlap(your_data, 'jaccard')
for clonesets or jaccard.index
for vectors) is a statistic used for comparing the similarity and diversity of sample sets.
Morisita’s overlap index (repOverlap(your_data, 'morisita')
for clonesets or morisitas.index
for other data) is a statistical measure of dispersion of individuals in a population and is used to compare overlap among samples. The formula is based on the assumption that increasing the size of the samples will increase the diversity because it will include different habitats (i.e. different faunas) (Morisita, 1959).
To visualise similarity among repertoires the vis.heatmap
function is appropriate.
# Apply the Morisitas overlap index to the each pair of repertoires.
# Use information about V genes (i.e. one CDR3 clonotype is equal to another
# if and only if their CDR3 aa sequences are equal and their V genes are equal)
repOverlap(twb, 'morisita', 'aa', 'read.count', .vgene = T, .verbose = F)
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## Subj.A Subj.B Subj.C Subj.D
## Subj.A NA 2.353362e-03 2.125406e-05 0.0002231617
## Subj.B 2.353362e-03 NA 9.765356e-06 0.0002413103
## Subj.C 2.125406e-05 9.765356e-06 NA 0.0004875433
## Subj.D 2.231617e-04 2.413103e-04 4.875433e-04 NA
ozScore
permutDistTest
pca2euclid
For assessing the distribution of clonotypes in the given repertoire, tcR provides functions for evaluating the diversity (functions diversity
and inverse.simpson
) and the skewness of the clonal distribution (functions gini
and gini.simpson
), and a general interface to all of this functions repDiversity
, which user should use to estimate the diversity of clonesets. Function diversity
(repDiversity(your_clonesets, "div")
) computes the ecological diversity index (with parameter .q
for penalties for clones with large count). Function inverse.simpson
(repDiversity(your_clonesets, "inv.simp")
) computes the Inverse Simpson Index (i.e., inverse probability of choosing two similar clonotypes). Function gini
(repDiversity(your_clonesets, "gini")
) computes the economical Gini index of clonal distribution. Function gini.simpson
(repDiversity(your_clonesets, "gini.simp")
) computes the Gini-Simpson index. Function chao1
(repDiversity(your_clonesets, "chao1")
) computes the Chao1 index, its SD and two 95 perc CI. Function repDiversity
accepts single clonesets as well as a list of clonesets. Parameter .quant
specifies which column to use for computing the diversity (print ?repDiversity
to see more information about input arguments).
# Evaluate the diversity of clones by the ecological diversity index.
repDiversity(twb, 'div', 'read.count')
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
sapply(twb, function (x) diversity(x$Read.count))
## Subj.A Subj.B Subj.C Subj.D
## 34.55417 23.97224 15.87257 98.03479
## Subj.A Subj.B Subj.C Subj.D
## 34.55417 23.97224 15.87257 98.03479
# Compute the diversity as the inverse probability of choosing two similar clonotypes.
repDiversity(twb, 'inv.simp', 'read.prop')
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
sapply(twb, function (x) inverse.simpson(x$Read.proportion))
## Subj.A Subj.B Subj.C Subj.D
## 117.63383 56.09537 55.31047 354.18601
## Subj.A Subj.B Subj.C Subj.D
## 117.63383 56.09537 55.31047 354.18601
# Evaluate the skewness of clonal distribution.
repDiversity(twb, 'gini.simp', 'read.prop')
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
sapply(twb, function (x) gini.simpson(x$Read.proportion))
## Subj.A Subj.B Subj.C Subj.D
## 0.9914990 0.9821732 0.9819202 0.9971766
## Subj.A Subj.B Subj.C Subj.D
## 0.9914990 0.9821732 0.9819202 0.9971766
# Compute diversity of repertoire using Chao index.
repDiversity(twb, 'chao1', 'read.count')
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
sapply(twb, function (x) chao1(x$Read.count))
## Subj.A Subj.B Subj.C Subj.D
## Estimator 1.000000e+04 1.000000e+04 1.00000e+04 1.000000e+04
## SD 5.223297e-04 1.322604e-03 2.90204e-04 2.992252e-06
## Conf.95.lo 1.000000e+04 1.000000e+04 1.00000e+04 1.000000e+04
## Conf.95.hi 1.000000e+04 1.000000e+04 1.00000e+04 1.000000e+04
## Subj.A Subj.B Subj.C Subj.D
## Estimator 1.000000e+04 1.000000e+04 1.00000e+04 1.000000e+04
## SD 5.223297e-04 1.322604e-03 2.90204e-04 2.992252e-06
## Conf.95.lo 1.000000e+04 1.000000e+04 1.00000e+04 1.000000e+04
## Conf.95.hi 1.000000e+04 1.000000e+04 1.00000e+04 1.000000e+04
Plots of the distribution of CDR3 nucleotide sequences length (function vis.count.len
) and the histogram of counts (function vis.number.count
). Input data is either a data frame or a list with data frames. Argument .col specifies column’s name with clonotype counts. Argument .ncol specifies a number of columns in a plot with multiple distribution, i.e., if the input data is a list with data frames.
vis.count.len(twb[[1]], .name = "twb[[1]] CDR3 lengths",
.col = "Read.count")
# I comment this to avoid a strange bug in ggplot2. Will uncomment later.
# vis.number.count(twb[[1]], .name = "twb[[1]] count distribution")
For the visualisation of proportions of the most abundant clonotypes in a repertoire tcR offers the vis.top.proportions
function. As input the function receives either data frame or a list with data frames (argument .data), an integer vector with number of clonotypes for computing proportions of count for this clonotypes (argument .head), and a column’s name with clonotype counts (argument .col).
vis.top.proportions(twb, c(10, 500, 3000, 10000), .col = "Read.count")
For the visualisation of how much space occupied each group of clonotypes, divided into groups by their proportions in the data, use the vis.clonal.space
function. As an input it receives the output of the clonal.space.homeostasis
function.
twb.space <- clonal.space.homeostasis(twb)
vis.clonal.space(twb.space)
Pairwise distances or similarity of repertoires can be represented as qudratic matrices, in which each row and column represented a cloneset, and each value in every cell (i, j) is a distance between repertoires with indices i and j. One way to visalise such matrices is using “heatmaps”. For plotting heatmaps in tcR implemented the vis.heatmap
function. With changing input arguments user can change names of labs, title and legend.
twb.shared <- repOverlap(twb, "exact", .norm = F, .verbose = F)
vis.heatmap(twb.shared, .title = "Twins shared nuc clonotypes",
.labs = c("Sample in x", "Sample in y"), .legend = "# clonotypes")
Another way to repsent distances among objects is “radar-like” plots (because this plots is not exactly radar plots) realised in tcR throught the vis.radarlike
function. Argument .ncol specifies a number of columns of radar-like plots in a viewport.
twb.js <- js.div.seg(twb, HUMAN_TRBV, .verbose = F)
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
vis.radarlike(twb.js, .ncol = 2)
For the visualisation of gene usage tcR employes subroutines for making classical histograms using the vis.gene.usage
function. The function accept clonesets, lists of clonesets or output from the geneUsage
function. If input is a cloneset(s), then user should specify a gene alphabet (e.g., HUMAN_TRBV
) in order to compute the gene usage. Using a parameter , user can change type of the output between an output as histograms for each cloneset in the input list (.dodge = F
) or an output as an one histogram for all data, which is very useful for comparing distribution of genes (.dodge = T
). If .dodge=F
and input are lists of clonesets or a gene usage of a few clonesets, than user with argument .ncol
can specify how many columns of histograms will be outputted. With .coord.flip
user can flip coordinates so genes will be at the left side of the plot.
vis.gene.usage(twb[[1]], HUMAN_TRBV, .main = 'Sample I V-usage')
vis.gene.usage(twb[[2]], HUMAN_TRBV, .main = 'Sample II V-usage', .coord.flip = T)
twb.jusage <- geneUsage(twb, HUMAN_TRBJ)
vis.gene.usage(twb.jusage, .main = 'Twins J-usage', .dodge = T)
vis.gene.usage(twb, HUMAN_TRBJ, .main = 'Twins J-usage', .dodge = F, .ncol = 2)
For the visualisation of results from the prcomp
function (i.e., objects of class prcomp
), tcR provides the vis.pca
function. Input arguments for the function are an object of class prcomp
and a (if needed) list with groups (vectors of indices of samples) for colouring points in the plot.
twb.pca <- pca.segments(twb, .do.plot = F)
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
vis.pca(pca.segments(twb, .do.plot = F, .genes = HUMAN_TRBV), .groups = list(GroupA = c(1,2), GroupB = c(3,4)))
##
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is
## !! designed to replace tcR:
## !! immunarch -- https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely,
## !! immunarch dev team and
## !! Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## Warning: In prcomp.default(t(as.matrix(.data)), ...) :
## extra argument '.genes' will be disregarded
Logo-like graphs for visualisation of nucleotide or amino acid motif sequences / profiles.
km <- get.kmers(twb[[1]]$CDR3.amino.acid.sequence, .head = 100, .k = 7, .verbose = F)
d <- kmer.profile(km)
vis.logo(d)
Mutation network (or a mutation graph) is a graph with vertices representing nucleotide or in-frame amino acid sequences (out-of-frame amino acid sequences will be automatically filtered out by tcR functions for mutation network creating) and edges which connecting pairs of sequences with hamming distance (parameter .method = ‘hamm’) or edit distance (parameter .method = ‘lev’) between them no more than specified in the .max.errors function parameter of the mutation.network
function. To create a mutation network first what you need is to make a shared repertoires and then apply the mutation.network
function to this shared repertoire:
# data(twb)
twb.shared <- shared.repertoire(twb, .head = 1000, .verbose = F)
G <- mutation.network(twb.shared)
G
## IGRAPH 4cb561c U--- 3704 337 --
## + attr: label (v/c), vseg (v/c), repind (v/n), prob (v/n), people
## | (v/c), npeople (v/n)
## + edges from 4cb561c:
## [1] 1-- 25 1-- 572 1-- 577 1--2671 2-- 765 6-- 613 7-- 617
## [8] 8-- 616 12-- 787 12-- 789 12-- 798 12--1010 14--1056 14--2432
## [15] 16--1110 18-- 19 18-- 21 18--1053 18--1264 18--1273 18--1313
## [22] 19--1273 19--1305 19--1327 20-- 21 20--1056 20--1327 21--1264
## [29] 21--1313 21--1327 21--2439 23--2106 24--2207 24--2354 29--3175
## [36] 32--2997 73-- 105 127-- 417 219-- 961 235-- 236 239-- 411 289--2646
## [43] 291-- 292 297-- 299 418--2231 423-- 440 439-- 440 460-- 461 463-- 954
## + ... omitted several edges
To manipulate vertex attributes functions and are provided.
# data(twb)
# twb.shared <- shared.repertoire(twb, .head = 1000)
# G <- mutation.network(twb.shared)
G <- set.group.vector(G, "twins", list(A = c(1,2), B = c(3,4))) # <= refactor this
get.group.names(G, "twins", 1)
## [1] "A|B"
get.group.names(G, "twins", 300)
## [1] "B"
get.group.names(G, "twins", c(1,2,3), F)
## [[1]]
## [1] "A" "B"
##
## [[2]]
## [1] "A" "B"
##
## [[3]]
## [1] "B"
get.group.names(G, "twins", 300, F)
## [[1]]
## [1] "B"
# Because we have only two groups, we can assign more readable attribute.
V(G)$twin.names <- get.group.names(G, "twins")
V(G)$twin.names[1]
## [1] "A|B"
V(G)$twin.names[300]
## [1] "B"
To access neighbour vertices of vertices (“ego-network”) use the function:
# data(twb)
# twb.shared <- shared.repertoire(twb, .head = 1000)
# G <- mutation.network(twb.shared)
head(mutated.neighbours(G, 1)[[1]])
## label vseg repind prob people npeople twins twin.names
## 1 CASSDRDTGELFF TRBV6-4 1 -1 0111 3 11 A|B
## 2 CASSYRDTGELFF TRBV6-3, TRBV6-2 25 -1 1001 2 11 A|B
## 3 CASSDRETGELFF TRBV6-4 572 -1 0100 1 10 A
## 4 CASSDRGTGELFF TRBV6-4 577 -1 0100 1 10 A
## 5 CASTDRDTGELFF TRBV10-2 2671 -1 1000 1 10 A
Feel free to contact me for the package-related or immunoinformatics research-related questions.
In the package implemented functions for working with k-mers. Function get.kmers
generates k-mers from the given chatacter vector or a data frame with columns for sequences and a count for each sequence.
head(get.kmers(twb[[1]]$CDR3.amino.acid.sequence, 100, .meat = F, .verbose = F))
## Kmers Count
## 1 CASSL 20
## 2 CASSP 12
## 3 ASSLG 11
## 4 CASSY 11
## 5 NEQFF 11
## 6 YEQYF 11
head(get.kmers(twb[[1]], .meat = T, .verbose = F))
## Kmers Count
## 1 CASSL 283192
## 2 DTQYF 217783
## 3 NEQFF 179230
## 4 CASSQ 158877
## 5 ASSLG 154560
## 6 YEQYF 148602
The package also provides a several number of functions for performing classic bioinformatics tasks on strings. For more powerful subroutines see the Bioconductor’s Biostrings package.
Functions for basic nucleotide sequences manipulations: reverse-complement, translation and GC-content computation. All functions are vectorised.
revcomp(c('AAATTT', 'ACGTTTGGA'))
## [1] "AAATTT" "TCCAAACGT"
cbind(bunch.translate(twb[[1]]$CDR3.nucleotide.sequence[1:10]),
twb[[1]]$CDR3.amino.acid.sequence[1:10])
## [,1] [,2]
## [1,] "CASSQALAGADTQYF" "CASSQALAGADTQYF"
## [2,] "CASSLGPRNTGELFF" "CASSLGPRNTGELFF"
## [3,] "CASSYGGAADTQYF" "CASSYGGAADTQYF"
## [4,] "CSAGGIETSYNEQFF" "CSAGGIETSYNEQFF"
## [5,] "CASSPILGEQFF" "CASSPILGEQFF"
## [6,] "CASKKDRDYGYTF" "CASKKDRDYGYTF"
## [7,] "CASSQQGSGNTIYF" "CASSQQGSGNTIYF"
## [8,] "CASSLGLHYEQYF" "CASSLGLHYEQYF"
## [9,] "CASSRASSYNSPLHF" "CASSRASSYNSPLHF"
## [10,] "CASSYLGPDDTEAFF" "CASSYLGPDDTEAFF"
gc.content(twb[[1]]$CDR3.nucleotide.sequence[1:10])
## [1] 0.5333333 0.5777778 0.5238095 0.4888889 0.5555556 0.4871795 0.4523810
## [8] 0.4871795 0.5555556 0.5333333
Function codon.variants
returns a list of vectors of nucleotide codons for each letter for each input amino acid sequence. Function translated.nucl.sequences
returns the number of nucleotide sequences, which, when translated, will result in the given amino acid sequence(s). Function reverse.translation
return all nucleotide sequences, which is translated to the given amino acid sequences. Optional argument .nucseq
for each of this function provides restriction for nucleotides, which cannot be changed. All functions are vectorised.
codon.variants('LQ')
## [[1]]
## [[1]][[1]]
## [1] "CTA" "CTC" "CTG" "CTT" "TTA" "TTG"
##
## [[1]][[2]]
## [1] "CAA" "CAG"
translated.nucl.sequences(c('LQ', 'CASSLQ'))
## [1] 12 3456
reverse.translation('LQ')
## [1] "CTACAA" "CTCCAA" "CTGCAA" "CTTCAA" "TTACAA" "TTGCAA" "CTACAG" "CTCCAG"
## [9] "CTGCAG" "CTTCAG" "TTACAG" "TTGCAG"
translated.nucl.sequences('LQ', 'XXXXXG')
## [1] 6
codon.variants('LQ', 'XXXXXG')
## [[1]]
## [[1]][[1]]
## [1] "CTA" "CTC" "CTG" "CTT" "TTA" "TTG"
##
## [[1]][[2]]
## [1] "CAG"
reverse.translation('LQ', 'XXXXXG')
## [1] "CTACAG" "CTCCAG" "CTGCAG" "CTTCAG" "TTACAG" "TTGCAG"