SMDIC User Guide

Baotong Zheng, Junwei Han

library(SMDIC)

1 Introduce

Here, we developed a software package (SMDIC) to automated identification of Somatic Mutation-Driven Immune Cell. The operation modes including: i) inferring the relative abundance matrix of tumor-infiltrating immune cells and integrating it with a particular gene mutation status, ii) detecting differential immune cells with respect to the gene mutation status and converting the abundance matrix of significant differential immune cell into two binary matrices (one for up-regulated and one for down-regulated cells), iii) identifying somatic mutation-driven immune cells by comparing the gene mutation status with each immune cell in the binary matrices across all samples, and iv) visualization of immune cell abundance of samples in different mutation status. Its capabilities enable SMDIC to identify somatic mutation-specific immune cell response. SMDIC may contribute to find neoantigens to facilitate the development of tailored immunotherapy for patients with cancer.
knitr::include_graphics("../inst/workflow.jpg")

This vignette illustrates how to easily use the SMDIC package. With the use of functions in this packages, users could identify the immune cells driven by somatic mutations in tumor microenvironment.

2 Example: Inference the relative abundance matrix of immune cells

We can use function GetExampleData to return example data and environment variables.

library(SMDIC)
library(GSVA)
exp.example<-GetExampleData("exp.example") # obtain example expression data 
cellmatrix.example<-exp2cell(exp.example,method="ssGSEA") #Cell abundance matrix,method must be one of "xCell","ssGSEA" and "CIBERSORT".
#> Warning in .local(expr, gset.idx.list, ...): 659 genes with constant
#> expression values throuhgout the samples.
#view first six rows and six colmns of cell infiltration score matrix.
cellmatrix.example[1:6,1:6]
#>                 TCGA.A7.A13G.01 TCGA.A7.A26E.01 TCGA.A1.A0SQ.01
#> aDC                  -0.1721896      -0.2365829     -0.09769543
#> B cells              -0.1785598      -0.2060056     -0.21060641
#> CD8 T cells           0.2603276       0.2939651      0.32180621
#> Cytotoxic cells      -0.1710623      -0.1744474     -0.13125948
#> DC                   -0.2110966      -0.2248506     -0.16642146
#> Eosinophils           0.1456574       0.1297275      0.16402532
#>                 TCGA.GI.A2C9.11 TCGA.E9.A1NE.01 TCGA.A2.A0SX.01
#> aDC                 -0.11102476      0.30829474     0.255985640
#> B cells             -0.15950503     -0.06773615    -0.007057109
#> CD8 T cells          0.34343045      0.42561320     0.337348145
#> Cytotoxic cells      0.04400304      0.29537554     0.125879324
#> DC                   0.16974204     -0.01435208     0.109405174
#> Eosinophils          0.12760198      0.13450722     0.068610081

3 Example: Use mutation annotation file (MAF) format data to build a binary mutations matrix

We extract the non-silent somatic mutations (nonsense mutation, missense mutation, frame-shif indels, splice site, nonstop mutation, translation start site, inframe indels) in protein-coding regions, and built a binary mutations matrix, in which 1 represents any mutation occurs in a particular gene in a particular sample, otherwise the element is 0. The genes with a given mutation frequency greater than a threshold value one precent as the default value) are retained for the following analysis.


maf <- system.file("extdata","example.maf.gz",package = "SMDIC") #get path of the mutation annotation file.
mutmatrix.example<-maf2matrix(maf) 
head(mutmatrix.example)[,1:6]
#>         TCGA.E9.A22G.01 TCGA.OL.A6VO.01 TCGA.AR.A1AP.01 TCGA.BH.A0AW.01
#> TP53                  1               1               1               1
#> TP53BP2               0               0               0               0
#> TP53BP1               0               0               0               0
#> PCDH10                0               0               0               0
#> CDH1                  0               0               0               0
#> CDH10                 0               0               0               0
#>         TCGA.E2.A1LG.01 TCGA.A2.A04U.01
#> TP53                  1               1
#> TP53BP2               0               0
#> TP53BP1               0               0
#> PCDH10                1               0
#> CDH1                  0               0
#> CDH10                 0               0

4 Example: Identification of somatic mutation-driven immune cells.

Function mutcorcell detects the differential immune cells with respect to a particular gene mutation status and construction of two binary matrices based on the abundance matrix of significant differential immune cell (one for up-regulated and one for down-regulated), identification of somatic mutation-driven immune cells by comparing the gene mutation status with each immune cell in the binary matrices.

#prepare data for following analysis.
cellmatrix<-GetExampleData("cellmatrix") # obtain example result from real rasult: cell abundance matrix from real data.
mutmatrix<-GetExampleData("mutmatrix")# select mutmatrix example result from real result: a binary mutations matrix
#mutcell<-mutcorcell(cell = cellmatrix,mutmatrix = mutmatrix,fisher.adjust = TRUE) ## perform the function `mutcorcell`.

result mutcell include three matrices:mut_cell is a binary numerical matrix which shows the immune cells driven by somatic mutation; mut_cell_p is a numerical matrix which shows pvalue of the immune cells driven by somatic mutation; mut_cell_fdr is a numerical matrix which shows fdr of the immune cells driven by somatic mutation; mut_cell_cellresponses is a character matrix which shows the cell responses of the immune cells driven by somatic mutation.

mutcell<-GetExampleData("mutcell") #get the result of the `mutcorcell` function
#view first ten rows and six colmns of mutcell matrix.
mutcell$mut_cell[1:6,1:6]
#>          aDC Adipocytes Astrocytes B-cells CD4+ memory T-cells
#> TP53       0          0          1       0                   0
#> CDH1       0          1          0       0                   0
#> NBPF14     0          0          0       0                   0
#> OBSCN      0          0          0       0                   0
#> NF1        1          0          0       1                   0
#> ANKRD30A   0          0          0       0                   0
#>          CD4+ T-cells
#> TP53                0
#> CDH1                0
#> NBPF14              0
#> OBSCN               0
#> NF1                 0
#> ANKRD30A            0
#mutcell$mut_cell_p
#mutcell$mut_cell_fdr
#mutcell$mut_cell_cellresponses
dim(mutcell$mut_cell)
#> [1] 10 42

Function mutcellsummary produces result summaries of the results of function mutcorcell function.

summary<-mutcellsummary(mutcell =mutcell,mutmatrix = mutmatrix,cellmatrix = cellmatrix)# The summary have four columns.The first column are gene names,the second column are the cells driven by the gene,the third column are the number of cells driven by the gene,the fourth column are mutation rates of gene.
head(summary)
#>     gene
#> 1   TP53
#> 2   CDH1
#> 3 NBPF14
#> 4  OBSCN
#> 5    NF1
#> 6  KCNA4
#>                                                                                                                                                                                          cells
#> 1 Astrocytes,CD8+ naive T-cells,CD8+ Tem,DC,Keratinocytes,Macrophages,Macrophages M1,Melanocytes,NK cells,pDC,Pericytes,Plasma cells,pro B-cells,Sebocytes,Tgd cells,Th1 cells,Th2 cells,Tregs
#> 2                                          Adipocytes,CD4+ Tcm,Chondrocytes,CMP,Endothelial cells,Fibroblasts,HSC,ly Endothelial cells,Megakaryocytes,Mesangial cells,mv Endothelial cells,NKT
#> 3                                                                                                                                          CD8+ Tem,DC,iDC,Keratinocytes,pro B-cells,Sebocytes
#> 4                                                                                                                               Epithelial cells,Keratinocytes,pro B-cells,Sebocytes,Th1 cells
#> 5                                                                                                                                                  aDC,B-cells,Memory B-cells,pDC,Plasma cells
#> 6                                                                                                                             CD8+ T-cells,CD8+ Tcm,Class-switched memory B-cells,NK cells,pDC
#>   count   mut rate
#> 1    18 0.34394251
#> 2    12 0.12936345
#> 3     6 0.01334702
#> 4     5 0.02977413
#> 5     5 0.03696099
#> 6     5 0.01129363

Function gene2cellsummary produces result summaries of the immune cells driven by a somatic mutation.

gene2cellsummary(gene="TP53",method="xCell",mutcell = mutcell) #a matrix shows the short name, full name, pvalue, fdr, sigtype of the cells driven by a somatic mutation
#>                    gene          cell name                    full name
#> Astrocytes         TP53         Astrocytes                   Astrocytes
#> CD8+ naive T-cells TP53 CD8+ naive T-cells           CD8+ naive T-cells
#> CD8+ Tem           TP53           CD8+ Tem CD8+ effector memory T-cells
#> DC                 TP53                 DC              Dendritic cells
#> Keratinocytes      TP53      Keratinocytes                Keratinocytes
#> Macrophages        TP53        Macrophages                  Macrophages
#> Macrophages M1     TP53     Macrophages M1               Macrophages M1
#> Melanocytes        TP53        Melanocytes                  Melanocytes
#> NK cells           TP53           NK cells                     NK cells
#> pDC                TP53                pDC Plasmacytoid dendritic cells
#> Pericytes          TP53          Pericytes                    Pericytes
#> Plasma cells       TP53       Plasma cells                 Plasma cells
#> pro B-cells        TP53        pro B-cells                  pro B-cells
#> Sebocytes          TP53          Sebocytes                    Sebocytes
#> Tgd cells          TP53          Tgd cells          Gamma delta T-cells
#> Th1 cells          TP53          Th1 cells        Type 1 T-helper cells
#> Th2 cells          TP53          Th2 cells        Type 2 T-helper cells
#> Tregs              TP53              Tregs           Regulatory T-cells
#>                          pvalue          fdr cell responses
#> Astrocytes         1.010916e-07 1.162554e-06             up
#> CD8+ naive T-cells 3.992545e-03 1.311836e-02             up
#> CD8+ Tem           1.748251e-02 4.467753e-02             up
#> DC                 1.604871e-02 4.385580e-02             up
#> Keratinocytes      2.331616e-07 1.981328e-06             up
#> Macrophages        1.820606e-04 1.196398e-03             up
#> Macrophages M1     6.327255e-09 9.701790e-08             up
#> Melanocytes        5.155187e-03 1.580924e-02             up
#> NK cells           5.359277e-04 2.465267e-03             up
#> pDC                2.076020e-03 8.121323e-03             up
#> Pericytes          3.445410e-04 1.981111e-03             up
#> Plasma cells       1.620758e-02 4.385580e-02             up
#> pro B-cells        2.637040e-03 9.331064e-03             up
#> Sebocytes          1.510706e-09 3.474624e-08             up
#> Tgd cells          2.118606e-03 8.121323e-03             up
#> Th1 cells          2.584341e-07 1.981328e-06             up
#> Th2 cells          8.859960e-13 4.075581e-11             up
#> Tregs              4.775752e-04 2.440940e-03             up

5 Example: Visualization

Function heatmapcell provides visualization of the relative abundance of identified immune cells in the gene mutation status using heat maps

library(pheatmap)
heatmapcell(gene = "TP53",mutcell = mutcell,cellmatrix = cellmatrix,mutmatrix = mutmatrix)

Function plotwaterfal and plotinteractions provides visualization of the mutation genes correlated with immune cells using waterfall plot and mutually exclusive or co-occurring plot.

#file<-"dir" #dir must be an absolute path or the name  relatived to the current working directory.
#plotwaterfall(maffile = file,mutcell.summary = summary,cellnumcuoff =4)
#plotCoocMutex(maffile = file,mutcell.summary = summary,cellnumcuoff =4)
knitr::include_graphics("../inst/plotwaterfall.jpeg")

knitr::include_graphics("../inst/plotinteractions.jpeg")

Function survcell draws Kaplan–Meier curves for survival in the above-median and below-median groups for cell risk score. The cell risk score is calaulated by the weighted mean of cells driven by a gene mutation, where the weight of cells is estimated by the “Univariate” or “Multivariate” cox.

mutcell<-GetExampleData("mutcell") # The result of `mutcorcell` function.
cellmatrix<-GetExampleData("cellmatrix") # Cell abundance matrix
surv<-GetExampleData("surv") # The survival data
survcell(gene ="TP53",mutcell=mutcell,cellmatrix=cellmatrix,surv=surv)