IMPORTANT INFORMATION

The tcR package WILL SOON BE ORPHANED AND REMOVED FROM CRAN.

A new package is available that is designed to replace tcR: immunarch – http://immunarch.com/

We will be happy to help you to move to the new package. Feel free to contact us: http://github.com/immunomind/immunarch

Sincerely, immunarch dev team and Vadim I. Nazarov, lead developer of tcR

Introduction

The tcR package designed to help researchers in the immunology field to analyse T cell receptor (TCR) and immunoglobulin (Ig) repertoires. In this vignette, I will cover procedures for immune receptor repertoire analysis provided with the package.

Terms:

Clonotype: a group of T / B cell clones with equal CDR3 nucleotide sequences and equal Variable genes.
Cloneset / repertoire: a set of clonotypes. Represented as a data frame in which each row corresponds to a unique clonotype.
UMI: Unique Molecular Identifier (see this paper for details)

Package features

Parsers for outputs of various tools for CDR3 extraction and genes alignment (currently implemented parsers for MiTCR, MiGEC, VDJtools, ImmunoSEQ, IMSEQ and MiXCR)
Data manipulation (in-frame / out-of-frame sequences subsetting, clonotype motif search)
Descriptive statistics (number of reads, number of clonotypes, gene segment usage)
Shared clonotypes statistics (number of shared clonotypes, using V genes or not; sequential intersection among the most abundant clonotype (“top-cross”))
Repertoire comparison (Jaccard index, Morisita’s overlap index, Horn’s index, Tversky index, overlap coefficient)
V- and J genes usage and it’s analysis (PCA, Shannon Entropy, Jensen-Shannon Divergence)
Diversity evaluation (ecological diversity index, Gini index, inverse Simpson index, rarefaction analysis)
Artificial repertoire generation (beta chain only, for now)
Spectratyping
Various visualisation procedures
Mutation networks (graphs, in which vertices represent CDR3 nucleotide / amino acid sequences and edges are connecting similar sequences with low hamming or edit distance between them)

Data in the package

There are two datasets provided with the package - twins data and V(D)J recombination genes data.

Downsampled twins data

twa.rda, twb.rda - two lists with 4 data frames in each list. Every data frame is a sample downsampled to the 10000 most abundant clonotypes of twins data (alpha and beta chains). Full data is available here:

Twins TCR data at Laboratory of Comparative and Functional Genomics

Explore the data:

# Load the package and load the data.
library(tcR)
data(twa)  # "twa" - list of length 4
data(twb)  # "twb" - list of length 4

# Explore the data.
head(twa[[1]])
head(twb[[1]])

Gene alphabets

Gene alphabets - character vectors with names of genes for TCR and Ig.

# Run help to see available alphabets.
?genealphabets
?genesegments
data(genesegments)

Quick start / automatic report generation

For the exploratory analysis of a single repertoire, use the RMarkdown report file at

"<path to the tcR package>/inst/library.report.Rmd"

Analysis in the file include statistics and visualisation of number of clones, clonotypes, in- and out-of-frame sequences, unique amino acid CDR3 sequences, V- and J-usage, most frequent k-mers, rarefaction analysis.

For the analysis of a group of repertoires (“cross-analysis”), use the RMarkdown report file at:

"<path to the tcR package>/inst/crossanalysis.report.Rmd}"

Analysis in this file include statistics and visualisation of number of shared clones and clonotypes, V-usage for individuals and groups, J-usage for individuals, Jensen-Shannon divergence among V-usages of repertoires and top-cross.

You need the knitr package installed in order to generate reports from default pipelines. In RStudio you can run a pipeline file as follows:

Run RStudio -> load the pipeline .Rmd files -> press the knitr button

Input parsing

Currently in tcR there are implemented parser for the next software:

MiTCR - parse.mitcr;
MiTCR w/ UMIs - parse.mitcrbc;
MiGEC - parse.migec;
VDJtools - parse.vdjtools;
ImmunoSEQ - parse.immunoseq;
MiXCR - parse.mixcr;
IMSEQ - parse.imseq.

Also a general parser parse.cloneset for a text table files is implemented. General wrapper for parsers is parse.file. User can also parse a list of files or the entire folder. Run ?parse.folder to see a help on parsing input files and a list of functions for parsing a specific input format.

# Parse file in "~/mitcr/immdata1.txt" as a MiTCR file.
immdata1 <- parse.file("~/mitcr_data/immdata1.txt", 'mitcr')
# equivalent to
immdata1.eq <- parse.mitcr("~/mitcr_data/immdata1.txt")

# Parse folder with MiGEC files.
immdata <- parse.folder("~/migec_data/", 'migec')

Cloneset representation

Clonesets represented in tcR as data frames with each row corresponding to the one nucleotide clonotype and with specific column names:

Umi.count - number of UMIs;
Umi.proportion - proportion of UMIs;
Read.count - number of reads;
Read.proportion - proportion of reads;
CDR3.nucleotide.sequence - CDR3 nucleotide sequence;
CDR3.amino.acid.sequence - CDR3 amino acid sequence;
V.gene - names of aligned Variable genes;
J.gene - names of aligned Joining genes;
D.gene - names of aligned Diversity genes;
V.end - last positions of aligned V genes (1-based);
J.start - first positions of aligned J genes (1-based);
D5.end - positions of D’5 end of aligned D genes (1-based);
D3.end - positions of D’3 end of aligned D genes (1-based);
VD.insertions - number of inserted nucleotides (N-nucleotides) at V-D junction (-1 for receptors with VJ recombination);
DJ.insertions - number of inserted nucleotides (N-nucleotides) at D-J junction (-1 for receptors with VJ recombination);
Total.insertions - total number of inserted nucleotides (number of N-nucleotides at V-J junction for receptors with VJ recombination).

Any data frame with this columns and of this class is suitable for processing with the package, hence user can generate their own table files and load them for the further analysis using read.csv, read.table and other base R functions. Please note that tcR internally expects all strings to be of class “character”, not “factor”. Therefore you should use R parsing functions with parameter stringsAsFactors=FALSE.

# No D genes is available here hence "" at "D.genes" and "-1" at positions.
str(twa[[1]])

## 'data.frame':    10000 obs. of  16 variables:
##  $ Umi.count               : logi  NA NA NA NA NA NA ...
##  $ Umi.proportion          : logi  NA NA NA NA NA NA ...
##  $ Read.count              : int  147158 58018 55223 41341 31525 30682 20913 16695 14757 13745 ...
##  $ Read.proportion         : num  0.0857 0.0338 0.0322 0.0241 0.0184 ...
##  $ CDR3.nucleotide.sequence: chr  "TGTGCTGTGATGGATAGCAACTATCAGTTAATCTGG" "TGTGCAGAGAAGTCTAGCAACACAGGCAAACTAATCTTT" "TGTGTGGTGAACTTTACTGGAGGCTTCAAAACTATCTTT" "TGTGCAGCAAGTATAGGGCTTGTATCTAACTTTGGAAATGAGAAATTAACCTTT" ...
##  $ CDR3.amino.acid.sequence: chr  "CAVMDSNYQLIW" "CAEKSSNTGKLIF" "CVVNFTGGFKTIF" "CAASIGLVSNFGNEKLTF" ...
##  $ V.gene                  : chr  "TRAV1-2" "TRAV5" "TRAV12-1" "TRAV13-1" ...
##  $ J.gene                  : chr  "TRAJ33" "TRAJ37" "TRAJ9" "TRAJ48" ...
##  $ D.gene                  : chr  "" "" "" "" ...
##  $ V.end                   : int  9 9 11 12 9 8 4 12 9 12 ...
##  $ J.start                 : int  10 12 14 22 19 9 15 13 15 14 ...
##  $ D5.end                  : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
##  $ D3.end                  : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
##  $ VD.insertions           : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
##  $ DJ.insertions           : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
##  $ Total.insertions        : int  0 2 2 9 9 0 10 0 5 1 ...

str(twb[[1]])

## 'data.frame':    10000 obs. of  16 variables:
##  $ Umi.count               : logi  NA NA NA NA NA NA ...
##  $ Umi.proportion          : logi  NA NA NA NA NA NA ...
##  $ Read.count              : int  81516 46158 32476 30356 27321 23760 22232 20968 20603 17429 ...
##  $ Read.proportion         : num  0.0578 0.0327 0.023 0.0215 0.0194 ...
##  $ CDR3.nucleotide.sequence: chr  "TGTGCCAGCAGCCAAGCTCTAGCGGGAGCAGATACGCAGTATTTT" "TGTGCCAGCAGCTTAGGCCCCAGGAACACCGGGGAGCTGTTTTTT" "TGTGCCAGCAGTTATGGAGGGGCGGCAGATACGCAGTATTTT" "TGCAGTGCTGGAGGGATTGAAACCTCCTACAATGAGCAGTTCTTC" ...
##  $ CDR3.amino.acid.sequence: chr  "CASSQALAGADTQYF" "CASSLGPRNTGELFF" "CASSYGGAADTQYF" "CSAGGIETSYNEQFF" ...
##  $ V.gene                  : chr  "TRBV4-2" "TRBV13" "TRBV12-4, TRBV12-3" "TRBV20-1" ...
##  $ J.gene                  : chr  "TRBJ2-3" "TRBJ2-2" "TRBJ2-3" "TRBJ2-1" ...
##  $ D.gene                  : chr  "TRBD2" "TRBD1, TRBD2" "TRBD2" "TRBD1, TRBD2" ...
##  $ V.end                   : int  15 16 12 12 13 9 11 14 12 14 ...
##  $ J.start                 : int  18 17 15 13 20 15 14 16 15 17 ...
##  $ D5.end                  : int  27 20 20 15 23 21 19 18 17 21 ...
##  $ D3.end                  : int  28 23 25 23 24 22 21 23 20 28 ...
##  $ VD.insertions           : int  2 0 2 0 6 5 2 1 2 2 ...
##  $ DJ.insertions           : int  0 2 4 7 0 0 1 4 2 6 ...
##  $ Total.insertions        : int  2 2 6 7 6 5 3 5 4 8 ...

Repertoire descriptive statistics

For the exploratory analysis tcR provides various functions for computing descriptive statistics.

Cloneset summary

To get a general view of a subject’s repertoire (overall count of sequences, in- and out-of-frames numbers and proportions) use the cloneset.stats function. It returns a summary of counts of nucleotide and amino acid clonotypes, as well as summary of read counts:

cloneset.stats(twb)

##        #Nucleotide clones #Aminoacid clonotypes %Aminoacid clonotypes
## Subj.A              10000                  9850                0.9850
## Subj.B              10000                  9838                0.9838
## Subj.C              10000                  9775                0.9775
## Subj.D              10000                  9872                0.9872
##        #In-frames %In-frames #Out-of-frames %Out-of-frames Sum.reads Min.reads
## Subj.A       9622     0.9622            346         0.0346   1410263        22
## Subj.B       9564     0.9564            400         0.0400   2251408        20
## Subj.C       9791     0.9791            192         0.0192    969949        23
## Subj.D       9225     0.9225            712         0.0712   1419130        32
##        1st Qu.reads Median.reads Mean.reads 3rd Qu.reads Max.reads
## Subj.A           26           33   141.0263           57     81516
## Subj.B           24           31   225.1408           55    171230
## Subj.C           28           39    96.9949           68    104628
## Subj.D           37           48   141.9130           83     33588

For characterisation of a library use the repseq.stats function:

repseq.stats(twb)

##        Clones Sum.reads Reads.per.clone
## Subj.A  10000   1410263          141.03
## Subj.B  10000   2251408          225.14
## Subj.C  10000    969949           96.99
## Subj.D  10000   1419130          141.91

Most abundant clonotypes statistics

Function clonal.proportion is used to get the number of most abundant by the count of reads clonotypes. E.g., compute number of clonotypes which fill up (approx.) the 25% from total repertoire’s “Read.count”:

                            # How many clonotypes fill up approximately
clonal.proportion(twb, 25)  # the 25% of the sum of values in 'Read.count'?

##        Clones Percentage Clonal.count.prop
## Subj.A     12       25.1            0.0012
## Subj.B      6       26.5            0.0006
## Subj.C      7       25.2            0.0007
## Subj.D     38       25.2            0.0038

To get a proportion of the most abundant clonotypes’ sum of reads to the overall number of reads in a repertoire, use top.proportion, i.e. get

(\(\sum\) reads of top clonotypes)\(/\)(\(\sum\) reads for all clonotypes).

E.g., get a proportion of the top-10 clonotypes’ reads to the overall number of reads:

                          # What accounts a proportion of the top-10 clonotypes' reads
top.proportion(twb, 10)   # to the overall number of reads?

##    Subj.A    Subj.B    Subj.C    Subj.D 
## 0.2289069 0.3648699 0.2620158 0.1305398

vis.top.proportions(twb)  # Plot this proportions.

Function tailbound.proportion with two arguments .col and .bound gets subset of the given data frame with clonotypes which have column .col with value \(\leq\) .bound and computes the ratio of sums of count reads of such subset to the overall data frame. E.g., get proportion of sum of reads of sequences which has “Read.count” <= 100 to the overall number of reads:

                                # What is a proportion of sequences which
                                # have 'Read.count' <= 100 to the
tailbound.proportion(twb, 100)  # overall number of reads?

## Subj.A Subj.B Subj.C Subj.D 
## 0.8651 0.8641 0.8555 0.8020

Clonal space homeostasis

Clonal space homeostasis is a useful statistics of how many space occupied by clonotypes with specific proportions.

# data(twb)
# Compute summary space of clones, that occupy
# [0, .05) and [.05, 1] proportion.
clonal.space.homeostasis(twb, c(Low = .05, High = 1))

##        Low (0 < X <= 0.05) High (0.05 < X <= 1)
## Subj.A           0.9421980           0.05780198
## Subj.B           0.9239454           0.07605463
## Subj.C           0.8279270           0.17207296
## Subj.D           1.0000000           0.00000000

# Use default arguments:
clonal.space.homeostasis(twb[[1]])

##        Rare (0 < X <= 1e-05) Small (1e-05 < X <= 1e-04)
## Sample                     0                  0.2589567
##        Medium (1e-04 < X <= 0.001) Large (0.001 < X <= 0.01)
## Sample                   0.2130291                 0.2666893
##        Hyperexpanded (0.01 < X <= 1)
## Sample                      0.261325

twb.space <- clonal.space.homeostasis(twb)
vis.clonal.space(twb.space)

In-frame and out-of-frame sequences

Functions for performing subsetting and counting number of in-frame and out-of-frame clonotypes are: count.inframes, count.outframes, get.inframes, get.outframes. Parameter .head for this functions is a parameter to the .head function, that applied to the input data frame or an input list of data frames before subsetting. Functions accept both data frames and list of data frames as parameters. E.g., get data frame with only in-frame sequences and count out-of-frame sequences in the first 5000 rows for this data frame:

imm.in <- get.inframes(twb) # Return all in-frame sequences from the 'twb'.

                            # Count the number of out-of-frame sequences
count.outframes(twb, 5000)  # from the first 5000 sequences.

## Subj.A Subj.B Subj.C Subj.D 
##    172    212     73    326

General functions with parameter stands for ‘all’ (all sequences), ‘in’ (only in-frame sequences) or ‘out’ (only out-of-frame sequences) are get.frames and count.frames:

imm.in <- get.frames(twb, 'in') # Similar to 'get.inframes(twb)'.

count.frames(twb[[1]], 'all')   # Just return number of rows.

## [1] 10000

flag <- 'out'
count.frames(twb, flag, 5000)   # Similar to 'count.outframes(twb, 5000)'.

## Subj.A Subj.B Subj.C Subj.D 
##    172    212     73    326

Search for a target CDR3 sequences

For exact or fuzzy search of sequences the package employed a function find.clonotypes. Input arguments for this function are a data frame or a list of data frames, targets (a character vector or data frame having one column with sequences and additional columns with, e.g., V genes), a value of which column or columns to return, a method to be used to compare sequences among each other (either “exact” for exact matching, “hamm” for matching sequences by Hamming distance (two sequences are matched if H \(\leq\) 1) or “lev” for matching sequences by Levenshtein distance (two sequences are matched if L \(\leq\) 1)), and column name from which sequences for matching are obtained. Sounds very complex, but in practice it’s very easy, therefore let’s go to examples.

Suppose we want to search for some CDR3 sequences in a number of repertoires:

cmv <- data.frame(CDR3.amino.acid.sequence = c('CASSSANYGYTF', 'CSVGRAQNEQFF', 'CASSLTGNTEAFF', 'CASSALGGAGTGELFF', 'CASSLIGVSSYNEQFF'),
                  V.genes = c('TRBV4-1', 'TRBV4-1', 'TRBV4-1', 'TRBV4-1', 'TRBV4-1'), stringsAsFactors = F)

cmv

##   CDR3.amino.acid.sequence V.genes
## 1             CASSSANYGYTF TRBV4-1
## 2             CSVGRAQNEQFF TRBV4-1
## 3            CASSLTGNTEAFF TRBV4-1
## 4         CASSALGGAGTGELFF TRBV4-1
## 5         CASSLIGVSSYNEQFF TRBV4-1

We will search for them using all methods of matching (exact, hamming or levenshtein) and with and without matching by V-segment. Also, for the first case (exact matching and without V gene) we return “Total.insertions” column along with the “Read.count” column, and for the second case output will be a “Rank” - rank (generated by set.rank) of a clone or a clonotype in a data frame.

twb <- set.rank(twb)
# Case 1.
cmv.imm.ex <- 
  find.clonotypes(.data = twb[1:2], .targets = cmv[,1], .method = 'exact',
                  .col.name = c('Read.count', 'Total.insertions'),
                  .verbose = F)
head(cmv.imm.ex)

##                    CDR3.amino.acid.sequence Read.count.Subj.A Read.count.Subj.B
## CASSALGGAGTGELFF           CASSALGGAGTGELFF               153               319
## CASSALGGAGTGELFF.1         CASSALGGAGTGELFF                NA                35
## CASSLTGNTEAFF                 CASSLTGNTEAFF                35               263
## CASSLTGNTEAFF.1               CASSLTGNTEAFF                35                35
## CASSLTGNTEAFF.2               CASSLTGNTEAFF                NA                28
## CASSSANYGYTF                   CASSSANYGYTF                NA             15320
##                    Total.insertions.Subj.A Total.insertions.Subj.B
## CASSALGGAGTGELFF                         9                      10
## CASSALGGAGTGELFF.1                      NA                       9
## CASSLTGNTEAFF                            2                       2
## CASSLTGNTEAFF.1                          1                       0
## CASSLTGNTEAFF.2                         NA                       1
## CASSSANYGYTF                            NA                       1

# Case 2.
# Search for CDR3 sequences with hamming distance <= 1
# to the one of the cmv$CDR3.amino.acid.sequence with
# matching V genes. Return ranks of found sequences.
cmv.imm.hamm.v <- 
  find.clonotypes(twb[1:3], cmv, 'hamm', 'Rank', 
                  .target.col = c('CDR3.amino.acid.sequence',
                                  'V.gene'),
                  .verbose = F)
head(cmv.imm.hamm.v)

##                  CDR3.amino.acid.sequence  V.gene Rank.Subj.A Rank.Subj.B
## CASSALGGAGTGELFF         CASSALGGAGTGELFF TRBV4-1          NA          NA
## CASSLIGVSSYNEQFF         CASSLIGVSSYNEQFF TRBV4-1          NA          NA
## CASSLTGNTEAFF               CASSLTGNTEAFF TRBV4-1          NA          NA
## CASSSANYGYTF                 CASSSANYGYTF TRBV4-1          NA          NA
## CSVGRAQNEQFF                 CSVGRAQNEQFF TRBV4-1          NA          NA
##                  Rank.Subj.C
## CASSALGGAGTGELFF          NA
## CASSLIGVSSYNEQFF          NA
## CASSLTGNTEAFF             NA
## CASSSANYGYTF              NA
## CSVGRAQNEQFF              NA

# Case 3.
# Similar to the previous example, except
# using levenshtein distance and the "Read.count" column.
cmv.imm.lev.v <- 
  find.clonotypes(twb[1:3], cmv, 'lev', 
                  .target.col = c('CDR3.amino.acid.sequence', 'V.gene'),
                  .verbose = F)
head(cmv.imm.lev.v)

##                  CDR3.amino.acid.sequence  V.gene Read.count.Subj.A
## CASSALGGAGTGELFF         CASSALGGAGTGELFF TRBV4-1                NA
## CASSLIGVSSYNEQFF         CASSLIGVSSYNEQFF TRBV4-1                NA
## CASSLTGNTEAFF               CASSLTGNTEAFF TRBV4-1                NA
## CASSSANYGYTF                 CASSSANYGYTF TRBV4-1                NA
## CSVGRAQNEQFF                 CSVGRAQNEQFF TRBV4-1                NA
##                  Read.count.Subj.B Read.count.Subj.C
## CASSALGGAGTGELFF                NA                NA
## CASSLIGVSSYNEQFF                NA                NA
## CASSLTGNTEAFF                   NA                NA
## CASSSANYGYTF                    NA                NA
## CSVGRAQNEQFF                    NA                NA

Gene usage

Variable and Joining gene usage (V-usage and J-usage) are important characteristics of repertoires. To access and compare them among repertoires tcR provides a few useful functions.

Gene usage computing

To access V- and J-usage of a repertoire tcR provides functions geneUsage. Function geneUsage, depending on parameters, computes frequencies or counts of the given elements (e.g., V genes) of the input data frame or the input list of data frames. V and J gene names for humans for TCR and Ig are stored in the .rda file genesegments.rda (they are identical to those form IMGT: and ). All of the mentioned functions are accept data frames as well as list of data frames. Output for those functions are data frames with the first column stands for a gene and the other for frequencies.

imm1.vs <- geneUsage(twb[[1]], HUMAN_TRBV)

## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================

head(imm1.vs)

##       Gene Sample
## 2 TRBV10-1     40
## 3 TRBV10-2     48
## 4 TRBV10-3    308
## 5 TRBV11-1     43
## 6 TRBV11-2    186
## 7 TRBV11-3     20

imm.vs.all <- geneUsage(twb, HUMAN_TRBV)

## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================

imm.vs.all[1:10, 1:4]

##        Gene Subj.A Subj.B Subj.C
## 2  TRBV10-1     40     35      9
## 3  TRBV10-2     48     65     22
## 4  TRBV10-3    308    307    328
## 5  TRBV11-1     43     34     33
## 6  TRBV11-2    186    230    223
## 7  TRBV11-3     20     23     27
## 8  TRBV12-3      0      0      0
## 9  TRBV12-4      0      0      0
## 10 TRBV12-5     15     22     37
## 11   TRBV13     68     39     44

You can also directly visualise gene usage with the function vis.gene.usage (if you pass the gene alphabet as a second argument):

# Put ".dodge = F" to get distinct plot for every data frame in the given list.
vis.gene.usage(twb, HUMAN_TRBJ, .main = 'twb J-usage dodge', .dodge = T)

vis.gene.usage(twb, HUMAN_TRBJ, .main = 'twb J-usage column', .dodge = F, .ncol = 2)

vis.gene.usage(imm1.vs, NA, .main = 'twb[[1]] V-usage', .coord.flip = F)

Gene usage comparing

To evaluate V- and J genes usage of repertoires, the package implements subroutines for two approaches to the analysis: measures from the information theory and PCA (Principal Component Analysis).

Shannon entropy and Jensen-Shannon divergence

To assess the diversity of genes usage user can use the entropy function. Kullback-Leibler assymetric measure (function kl.div) and Jensen-Shannon symmetric measure (functions js.div for computing JS-divergence between the given distributions and js.div.seg for computing JS-divergence between genes distributions of two clonesets or a list with data frames) are provided to estimate distance among gene usage of different repertoires. To visualise distances tcR employed the vis.radarlike function, see Section “Plots” for more detailed information.

                              # Transform "0:100" to distribution with Laplace correction 
entropy(0:100, .laplace = 1)  # (i.e., add "1" to every value before transformation).

## [1] 6.399261

entropy.seg(twb, HUMAN_TRBV)  # Compute entropy of V-segment usage for each data frame.

## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================

##   Subj.A   Subj.B   Subj.C   Subj.D 
## 4.684783 4.751303 4.591658 4.539259

js.div.seg(twb[1:2], HUMAN_TRBV, .verbose = F)

## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================

## [1] 0.0008587249

imm.js <- js.div.seg(twb, HUMAN_TRBV, .verbose = F)

## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================

vis.radarlike(imm.js, .ncol = 2)

Principal Component Analysis (PCA)

Principal component analysis (PCA) is a statistical procedure for transforming a set of observations to a set of special values for analysis. In tcR implemented functions pca.segments for performing PCA on V- or J-usage, and pca.segments.2D for performing PCA on VJ-usage. For plotting the PCA results see the vis.pca function.

pca.segments(twb, .genes = HUMAN_TRBV)  # Plot PCA results of V-segment usage.

## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================

## Warning: In prcomp.default(t(as.matrix(.data)), ...) :
##  extra argument '.genes' will be disregarded

# Return object of class "prcomp"
class(pca.segments(twb, .do.plot = F, .genes = HUMAN_TRBV))

## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================

## Warning: In prcomp.default(t(as.matrix(.data)), ...) :
##  extra argument '.genes' will be disregarded

## [1] "prcomp"

Repertoire overlap analysis

tcR provides a number of functions for evaluating similarity of clonesets based on shared among clonesets clonotypes and working with data frames with shared clonotypes.

Overlap quantification

The general interface to all functions for computing cloneset overlap coefficients is the repOverlap function.

Number of shared clonotypes

The most straightforward yet a quite effective way to evaluate similarity of two clonesets is compute the number of shared clonotypes. tcR adds the new function intersectClonesets (repOverlap(your_data, 'exact')) which is by default computes the number of shared clonotypes using the “CDR3.nucleotide.sequence” columns of the given data frames, but user can change target columns by using arguments .type or .col. As in the find.clonotypes, user can choose which method apply to the elements: exact match of elements, match by Hamming distance or match by Levenshtein distance. Logical argument .norm is used to perform normalisation of the number of shared clonotypes by dividing this number by multiplication of clonesets’ sizes (strongly recommended otherwise your results will be correlating with clonesets’ sizes).

# Equivalent to intersect(twb[[1]]$CDR3.nucleotide.sequence,
#                         twb[[2]]$CDR3.nucleotide.sequence)
repOverlap(twb[1:2], 'exact', 'nuc', .verbose = F)

## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================

## [1] 4.6e-07

# Equivalent to intersectClonesets(twb, "n0e", .norm = T)
repOverlap(twb, 'exact', 'nuc', .norm = T, .verbose = F)

## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================

##         Subj.A  Subj.B  Subj.C  Subj.D
## Subj.A      NA 4.6e-07 4.4e-07 4.0e-07
## Subj.B 4.6e-07      NA 2.7e-07 2.7e-07
## Subj.C 4.4e-07 2.7e-07      NA 6.2e-07
## Subj.D 4.0e-07 2.7e-07 6.2e-07      NA

# Intersect by amino acid clonotypes + V genes
repOverlap(twb, 'exact', 'aa', .vgene = T, .verbose = F)

## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================

##          Subj.A   Subj.B   Subj.C   Subj.D
## Subj.A       NA 1.58e-06 6.50e-07 5.80e-07
## Subj.B 1.58e-06       NA 5.60e-07 4.70e-07
## Subj.C 6.50e-07 5.60e-07       NA 1.31e-06
## Subj.D 5.80e-07 4.70e-07 1.31e-06       NA

# Plot a heatmap of the number of shared clonotypes.
vis.heatmap(repOverlap(twb, 'exact', 'aa', .vgene = T, .verbose = F), .title = 'twb - (ave)-intersection', .labs = '')

## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================

See the vis.heatmap function in the Section “Visualisation” for the visualisation of the intersection results.

Functions intersectCount, intersectLogic and intersectIndices are more flexible in terms of choosing which columns to match. They all have parameter .col that specifies names of columns which will used in computing intersection. Function intersectCount returns number of similar elements; intersectIndices(x, y) returns 2-column matrix with the first column stands for an index of an element in the given x, and the second column stands for an index of that element of y which is similar to a relative element in x; intersectLogic(x, y) returns a logical vector of length(x) or nrow(x), where TRUE at position i means that element with index {i} has been found in the y.

# Get logic vector of shared elements, where
# elements are tuples of CDR3 nucleotide sequence and corresponding V-segment
imm.1.2 <- intersectLogic(twb[[1]], twb[[2]],
                           .col = c('CDR3.amino.acid.sequence', 'V.gene'))  
# Get elements which are in both twb[[1]] and twb[[2]].
head(twb[[1]][imm.1.2, c('CDR3.amino.acid.sequence', 'V.gene')])

##    CDR3.amino.acid.sequence V.gene
## 8             CASSLGLHYEQYF TRBV28
## 14            CAWSRQTNTEAFF TRBV30
## 17            CASSLGVGYEQYF TRBV28
## 19            CASSLGLHYEQYF TRBV28
## 30            CASSLGLNYEQYF TRBV28
## 66            CASSLGVSYEQYF TRBV28

“Top cross”

Number of shared clonotypes among the most abundant clonotypes may differ signigicantly from those with lesses count. To support research tcR offers the top.cross function, that apply tcR::intersectClonesets, e.g., to the first 1000 clonotypes, 2000, 3000 and so on up to the first 100000 clones, if supplied .n == seq(1000, 100000, 1000).

twb.top <- top.cross(.data = twb, .n = seq(500, 10000, 500), .verbose = F, .norm = T)
top.cross.plot(twb.top)

More complex similarity measures

tcR also provides more complex similarity measures for evaluating the similarity of sets.

Tversky index (repOverlap(your_data, 'tversky') for clonesets or tversky.index for vectors) is an asymmetric similarity measure on sets that compares a variant to a prototype. If using default arguments, it’s similar to Dice’s coefficient.
Overlap coefficient (repOverlap(your_data, 'overlap') for clonesets or overlap.coef for vectors) is a similarity measure that measures the overlap between two sets, and is defined as the size of the intersection divided by the smaller of the size of the two sets.
Jaccard index (repOverlap(your_data, 'jaccard') for clonesets or jaccard.index for vectors) is a statistic used for comparing the similarity and diversity of sample sets.
Morisita’s overlap index (repOverlap(your_data, 'morisita') for clonesets or morisitas.index for other data) is a statistical measure of dispersion of individuals in a population and is used to compare overlap among samples. The formula is based on the assumption that increasing the size of the samples will increase the diversity because it will include different habitats (i.e. different faunas) (Morisita, 1959).

To visualise similarity among repertoires the vis.heatmap function is appropriate.

# Apply the Morisitas overlap index to the each pair of repertoires.
# Use information about V genes (i.e. one CDR3 clonotype is equal to another
# if and only if their CDR3 aa sequences are equal and their V genes are equal)
repOverlap(twb, 'morisita', 'aa', 'read.count', .vgene = T, .verbose = F)

## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================

##              Subj.A       Subj.B       Subj.C       Subj.D
## Subj.A           NA 2.353362e-03 2.125406e-05 0.0002231617
## Subj.B 2.353362e-03           NA 9.765356e-06 0.0002413103
## Subj.C 2.125406e-05 9.765356e-06           NA 0.0004875433
## Subj.D 2.231617e-04 2.413103e-04 4.875433e-04           NA

Overlap statistics and tests

Overlap Z-score (OZ-score) - a measure for ???abnormality??? in overlaps

ozScore

Monte Carlo permutation test for pairwise and one-vs-all-wise within- and inter-group differences in a set of repertoires

permutDistTest

pca2euclid

Shared repertoire

To investigate a shared among a several repertoires clonotypes (“shared repertoire”) the package provided the shared.repertoire function along with functions for computing the shared repertoire statistics. The shared.representation function computes the number of shared clonotypes for each repertoire for each degree of sharing (i.e., number of people, in which indicated amount of clones have been found). The function shared.summary is equivalent to repOverlap(, 'exact') but applies to the shared repertoire data frame. Measuring distances among repertoires using the cosine similarity on vector of counts of shared sequences is also possible with the cosine.sharing function.

# Compute shared repertoire of amino acid CDR3 sequences and V genes
# which has been found in two or more people and return the Read.count column
# of such clonotypes from each data frame in the input list.
imm.shared <- shared.repertoire(.data = twb, .type = 'avrc', .min.ppl = 2, .verbose = F)
head(imm.shared)

##   CDR3.amino.acid.sequence  V.gene People Subj.A Subj.B Subj.C Subj.D
## 1            CASSDRDTGELFF TRBV6-4      4    113    411    176   2398
## 2          CASSDSSGGYNEQFF TRBV6-4      4     68    357     31    115
## 3           CASSFLSGTDTQYF  TRBV28      4     36    111     59    203
## 4            CASSGQGNTEAFF   TRBV2      4    223    252     69    152
## 5           CASSLGQGGQPQHF TRBV7-9      4     34    139     31     84
## 6            CASKGQLNTEAFF  TRBV19      3    125     NA     37     34

shared.representation(imm.shared)  # Number of shared sequences.

##   Subj.A Subj.B Subj.C Subj.D
## 1      0      0      0      0
## 2    219    205    192    170
## 3     22     19     20     23
## 4      5      5      5      5

Diversity evaluation

For assessing the distribution of clonotypes in the given repertoire, tcR provides functions for evaluating the diversity (functions diversity and inverse.simpson) and the skewness of the clonal distribution (functions gini and gini.simpson), and a general interface to all of this functions repDiversity, which user should use to estimate the diversity of clonesets. Function diversity (repDiversity(your_clonesets, "div")) computes the ecological diversity index (with parameter .q for penalties for clones with large count). Function inverse.simpson (repDiversity(your_clonesets, "inv.simp")) computes the Inverse Simpson Index (i.e., inverse probability of choosing two similar clonotypes). Function gini (repDiversity(your_clonesets, "gini")) computes the economical Gini index of clonal distribution. Function gini.simpson (repDiversity(your_clonesets, "gini.simp")) computes the Gini-Simpson index. Function chao1 (repDiversity(your_clonesets, "chao1")) computes the Chao1 index, its SD and two 95 perc CI. Function repDiversity accepts single clonesets as well as a list of clonesets. Parameter .quant specifies which column to use for computing the diversity (print ?repDiversity to see more information about input arguments).

# Evaluate the diversity of clones by the ecological diversity index.
repDiversity(twb, 'div', 'read.count')

## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================

sapply(twb, function (x) diversity(x$Read.count))

##   Subj.A   Subj.B   Subj.C   Subj.D 
## 34.55417 23.97224 15.87257 98.03479 
##   Subj.A   Subj.B   Subj.C   Subj.D 
## 34.55417 23.97224 15.87257 98.03479

# Compute the diversity as the inverse probability of choosing two similar clonotypes.
repDiversity(twb, 'inv.simp', 'read.prop')

## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================

sapply(twb, function (x) inverse.simpson(x$Read.proportion))

##    Subj.A    Subj.B    Subj.C    Subj.D 
## 117.63383  56.09537  55.31047 354.18601 
##    Subj.A    Subj.B    Subj.C    Subj.D 
## 117.63383  56.09537  55.31047 354.18601

# Evaluate the skewness of clonal distribution.
repDiversity(twb, 'gini.simp', 'read.prop')

## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================

sapply(twb, function (x) gini.simpson(x$Read.proportion))

##    Subj.A    Subj.B    Subj.C    Subj.D 
## 0.9914990 0.9821732 0.9819202 0.9971766 
##    Subj.A    Subj.B    Subj.C    Subj.D 
## 0.9914990 0.9821732 0.9819202 0.9971766

# Compute diversity of repertoire using Chao index.
repDiversity(twb, 'chao1', 'read.count')

## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================

sapply(twb, function (x) chao1(x$Read.count))

##                  Subj.A       Subj.B      Subj.C       Subj.D
## Estimator  1.000000e+04 1.000000e+04 1.00000e+04 1.000000e+04
## SD         5.223297e-04 1.322604e-03 2.90204e-04 2.992252e-06
## Conf.95.lo 1.000000e+04 1.000000e+04 1.00000e+04 1.000000e+04
## Conf.95.hi 1.000000e+04 1.000000e+04 1.00000e+04 1.000000e+04
##                  Subj.A       Subj.B      Subj.C       Subj.D
## Estimator  1.000000e+04 1.000000e+04 1.00000e+04 1.000000e+04
## SD         5.223297e-04 1.322604e-03 2.90204e-04 2.992252e-06
## Conf.95.lo 1.000000e+04 1.000000e+04 1.00000e+04 1.000000e+04
## Conf.95.hi 1.000000e+04 1.000000e+04 1.00000e+04 1.000000e+04

Visualisation

CDR3 length and read count distributions plot

Plots of the distribution of CDR3 nucleotide sequences length (function vis.count.len) and the histogram of counts (function vis.number.count). Input data is either a data frame or a list with data frames. Argument .col specifies column’s name with clonotype counts. Argument .ncol specifies a number of columns in a plot with multiple distribution, i.e., if the input data is a list with data frames.

vis.count.len(twb[[1]], .name = "twb[[1]] CDR3 lengths", 
              .col = "Read.count")

# I comment this to avoid a strange bug in ggplot2. Will uncomment later.
# vis.number.count(twb[[1]], .name = "twb[[1]] count distribution")

Top proportions bar plot

For the visualisation of proportions of the most abundant clonotypes in a repertoire tcR offers the vis.top.proportions function. As input the function receives either data frame or a list with data frames (argument .data), an integer vector with number of clonotypes for computing proportions of count for this clonotypes (argument .head), and a column’s name with clonotype counts (argument .col).

vis.top.proportions(twb, c(10, 500, 3000, 10000), .col = "Read.count")

Clonal space homeostasis bar plot

For the visualisation of how much space occupied each group of clonotypes, divided into groups by their proportions in the data, use the vis.clonal.space function. As an input it receives the output of the clonal.space.homeostasis function.

twb.space <- clonal.space.homeostasis(twb)
vis.clonal.space(twb.space)

Heat map

Pairwise distances or similarity of repertoires can be represented as qudratic matrices, in which each row and column represented a cloneset, and each value in every cell (i, j) is a distance between repertoires with indices i and j. One way to visalise such matrices is using “heatmaps”. For plotting heatmaps in tcR implemented the vis.heatmap function. With changing input arguments user can change names of labs, title and legend.

twb.shared <- repOverlap(twb, "exact", .norm = F, .verbose = F)
vis.heatmap(twb.shared, .title = "Twins shared nuc clonotypes", 
            .labs = c("Sample in x", "Sample in y"), .legend = "# clonotypes")

Radar-like plot

Another way to repsent distances among objects is “radar-like” plots (because this plots is not exactly radar plots) realised in tcR throught the vis.radarlike function. Argument .ncol specifies a number of columns of radar-like plots in a viewport.

twb.js <- js.div.seg(twb, HUMAN_TRBV, .verbose = F)

## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================
## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================

vis.radarlike(twb.js, .ncol = 2)

Gene usage histogram

For the visualisation of gene usage tcR employes subroutines for making classical histograms using the vis.gene.usage function. The function accept clonesets, lists of clonesets or output from the geneUsage function. If input is a cloneset(s), then user should specify a gene alphabet (e.g., HUMAN_TRBV) in order to compute the gene usage. Using a parameter , user can change type of the output between an output as histograms for each cloneset in the input list (.dodge = F) or an output as an one histogram for all data, which is very useful for comparing distribution of genes (.dodge = T). If .dodge=F and input are lists of clonesets or a gene usage of a few clonesets, than user with argument .ncol can specify how many columns of histograms will be outputted. With .coord.flip user can flip coordinates so genes will be at the left side of the plot.

vis.gene.usage(twb[[1]], HUMAN_TRBV, .main = 'Sample I V-usage')

vis.gene.usage(twb[[2]], HUMAN_TRBV, .main = 'Sample II V-usage', .coord.flip = T)

twb.jusage <- geneUsage(twb, HUMAN_TRBJ)
vis.gene.usage(twb.jusage, .main = 'Twins J-usage', .dodge = T)

vis.gene.usage(twb, HUMAN_TRBJ, .main = 'Twins J-usage', .dodge = F, .ncol = 2)

PCA visualisation

For the visualisation of results from the prcomp function (i.e., objects of class prcomp), tcR provides the vis.pca function. Input arguments for the function are an object of class prcomp and a (if needed) list with groups (vectors of indices of samples) for colouring points in the plot.

twb.pca <- pca.segments(twb, .do.plot = F)

## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================

vis.pca(pca.segments(twb, .do.plot = F, .genes = HUMAN_TRBV), .groups = list(GroupA = c(1,2), GroupB = c(3,4)))

## 
## ========================================
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## !! The tcR package WILL SOON BE ORPHANED 
## !! AND REMOVED FROM CRAN.
## !!
## !! A new package is available that is 
## !! designed to replace tcR: 
## !! immunarch  --  https://immunarch.com/
## !!
## !! We will be happy to help you to move
## !! to the new package. Feel free to contact us:
## !! http://github.com/immunomind/immunarch
## !!
## !! Sincerely, 
## !!  immunarch dev team and 
## !!  Vadim I. Nazarov, lead developer of tcR
## !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
## =======================================

## Warning: In prcomp.default(t(as.matrix(.data)), ...) :
##  extra argument '.genes' will be disregarded

Logo-like plot

Logo-like graphs for visualisation of nucleotide or amino acid motif sequences / profiles.

km <- get.kmers(twb[[1]]$CDR3.amino.acid.sequence, .head = 100, .k = 7, .verbose = F)
d <- kmer.profile(km)
vis.logo(d)

Mutation networks

Mutation network (or a mutation graph) is a graph with vertices representing nucleotide or in-frame amino acid sequences (out-of-frame amino acid sequences will be automatically filtered out by tcR functions for mutation network creating) and edges which connecting pairs of sequences with hamming distance (parameter .method = ‘hamm’) or edit distance (parameter .method = ‘lev’) between them no more than specified in the .max.errors function parameter of the mutation.network function. To create a mutation network first what you need is to make a shared repertoires and then apply the mutation.network function to this shared repertoire:

# data(twb)
twb.shared <- shared.repertoire(twb, .head = 1000, .verbose = F)
G <- mutation.network(twb.shared)
G

## IGRAPH 4cb561c U--- 3704 337 -- 
## + attr: label (v/c), vseg (v/c), repind (v/n), prob (v/n), people
## | (v/c), npeople (v/n)
## + edges from 4cb561c:
##  [1]   1--  25   1-- 572   1-- 577   1--2671   2-- 765   6-- 613   7-- 617
##  [8]   8-- 616  12-- 787  12-- 789  12-- 798  12--1010  14--1056  14--2432
## [15]  16--1110  18--  19  18--  21  18--1053  18--1264  18--1273  18--1313
## [22]  19--1273  19--1305  19--1327  20--  21  20--1056  20--1327  21--1264
## [29]  21--1313  21--1327  21--2439  23--2106  24--2207  24--2354  29--3175
## [36]  32--2997  73-- 105 127-- 417 219-- 961 235-- 236 239-- 411 289--2646
## [43] 291-- 292 297-- 299 418--2231 423-- 440 439-- 440 460-- 461 463-- 954
## + ... omitted several edges

To manipulate vertex attributes functions and are provided.

# data(twb)
# twb.shared <- shared.repertoire(twb, .head = 1000)
# G <- mutation.network(twb.shared)
G <- set.group.vector(G, "twins", list(A = c(1,2), B = c(3,4)))  # <= refactor this
get.group.names(G, "twins", 1)

## [1] "A|B"

get.group.names(G, "twins", 300)

## [1] "B"

get.group.names(G, "twins", c(1,2,3), F)

## [[1]]
## [1] "A" "B"
## 
## [[2]]
## [1] "A" "B"
## 
## [[3]]
## [1] "B"

get.group.names(G, "twins", 300, F)

## [[1]]
## [1] "B"

# Because we have only two groups, we can assign more readable attribute.
V(G)$twin.names <- get.group.names(G, "twins")
V(G)$twin.names[1]

## [1] "A|B"

V(G)$twin.names[300]

## [1] "B"

To access neighbour vertices of vertices (“ego-network”) use the function:

# data(twb)
# twb.shared <- shared.repertoire(twb, .head = 1000)
# G <- mutation.network(twb.shared)
head(mutated.neighbours(G, 1)[[1]])

##           label             vseg repind prob people npeople twins twin.names
## 1 CASSDRDTGELFF          TRBV6-4      1   -1   0111       3    11        A|B
## 2 CASSYRDTGELFF TRBV6-3, TRBV6-2     25   -1   1001       2    11        A|B
## 3 CASSDRETGELFF          TRBV6-4    572   -1   0100       1    10          A
## 4 CASSDRGTGELFF          TRBV6-4    577   -1   0100       1    10          A
## 5 CASTDRDTGELFF         TRBV10-2   2671   -1   1000       1    10          A

Conclusion

Feel free to contact me for the package-related or immunoinformatics research-related questions.

Appendix A: Kmers manipulation and processing

In the package implemented functions for working with k-mers. Function get.kmers generates k-mers from the given chatacter vector or a data frame with columns for sequences and a count for each sequence.

head(get.kmers(twb[[1]]$CDR3.amino.acid.sequence, 100, .meat = F, .verbose = F))

##   Kmers Count
## 1 CASSL    20
## 2 CASSP    12
## 3 ASSLG    11
## 4 CASSY    11
## 5 NEQFF    11
## 6 YEQYF    11

head(get.kmers(twb[[1]], .meat = T, .verbose = F))

##   Kmers  Count
## 1 CASSL 283192
## 2 DTQYF 217783
## 3 NEQFF 179230
## 4 CASSQ 158877
## 5 ASSLG 154560
## 6 YEQYF 148602

Appendix B: Nucleotide and amino acid sequences manipulation

The package also provides a several number of functions for performing classic bioinformatics tasks on strings. For more powerful subroutines see the Bioconductor’s Biostrings package.

Nucleotide sequence manipulation

Functions for basic nucleotide sequences manipulations: reverse-complement, translation and GC-content computation. All functions are vectorised.

revcomp(c('AAATTT', 'ACGTTTGGA'))

## [1] "AAATTT"    "TCCAAACGT"

cbind(bunch.translate(twb[[1]]$CDR3.nucleotide.sequence[1:10]),
      twb[[1]]$CDR3.amino.acid.sequence[1:10])

##       [,1]              [,2]             
##  [1,] "CASSQALAGADTQYF" "CASSQALAGADTQYF"
##  [2,] "CASSLGPRNTGELFF" "CASSLGPRNTGELFF"
##  [3,] "CASSYGGAADTQYF"  "CASSYGGAADTQYF" 
##  [4,] "CSAGGIETSYNEQFF" "CSAGGIETSYNEQFF"
##  [5,] "CASSPILGEQFF"    "CASSPILGEQFF"   
##  [6,] "CASKKDRDYGYTF"   "CASKKDRDYGYTF"  
##  [7,] "CASSQQGSGNTIYF"  "CASSQQGSGNTIYF" 
##  [8,] "CASSLGLHYEQYF"   "CASSLGLHYEQYF"  
##  [9,] "CASSRASSYNSPLHF" "CASSRASSYNSPLHF"
## [10,] "CASSYLGPDDTEAFF" "CASSYLGPDDTEAFF"

gc.content(twb[[1]]$CDR3.nucleotide.sequence[1:10])

##  [1] 0.5333333 0.5777778 0.5238095 0.4888889 0.5555556 0.4871795 0.4523810
##  [8] 0.4871795 0.5555556 0.5333333

Reverse translation subroutines

Function codon.variants returns a list of vectors of nucleotide codons for each letter for each input amino acid sequence. Function translated.nucl.sequences returns the number of nucleotide sequences, which, when translated, will result in the given amino acid sequence(s). Function reverse.translation return all nucleotide sequences, which is translated to the given amino acid sequences. Optional argument .nucseq for each of this function provides restriction for nucleotides, which cannot be changed. All functions are vectorised.

codon.variants('LQ')

## [[1]]
## [[1]][[1]]
## [1] "CTA" "CTC" "CTG" "CTT" "TTA" "TTG"
## 
## [[1]][[2]]
## [1] "CAA" "CAG"

translated.nucl.sequences(c('LQ', 'CASSLQ'))

## [1]   12 3456

reverse.translation('LQ')

##  [1] "CTACAA" "CTCCAA" "CTGCAA" "CTTCAA" "TTACAA" "TTGCAA" "CTACAG" "CTCCAG"
##  [9] "CTGCAG" "CTTCAG" "TTACAG" "TTGCAG"

translated.nucl.sequences('LQ', 'XXXXXG')

## [1] 6

codon.variants('LQ', 'XXXXXG')

## [[1]]
## [[1]][[1]]
## [1] "CTA" "CTC" "CTG" "CTT" "TTA" "TTG"
## 
## [[1]][[2]]
## [1] "CAG"

reverse.translation('LQ', 'XXXXXG')

## [1] "CTACAG" "CTCCAG" "CTGCAG" "CTTCAG" "TTACAG" "TTGCAG"

tcR: a package for T cell receptor and Immunoglobulin repertoires advanced data analysis Vadim I. Nazarov

tcR: a package for T cell receptor and Immunoglobulin repertoires advanced data analysis

Vadim I. Nazarov

vdm.nazarov@gmail.com Laboratory of Comparative and Functional Genomics, IBCH RAS, Moscow, Russia

vdm.nazarov@gmail.com

Laboratory of Comparative and Functional Genomics, IBCH RAS, Moscow, Russia