ProliferativeIndex Vignette

The ProliferativeIndex R package¹ provides users with R functions for calculating and analyzing the proliferative index (PI) from an RNA-seq dataset.

IMPORTANT: Proliferative Indices are only interpretable relative to other PIs. For example, higher/lower PI in tumors compared to normal tissues or in post-mitotic tissues compared to in tissues with high rates of cell turnover. Additionally, PI is measuring proliferation associated with expression (as described above) and not necessarily proliferation itself.

Example Data Set

Included with ProliferativeIndex specifically for use with this vignette is data from the The Cancer Genome Atlas (TCGA) Adrenocortical Carcinoma (ACC) dataset.³

After first loading the ProliferativeIndex library:

library(ProliferativeIndex)

This dataset, vstTCGA_ACCData_sub can be accessed from the package:

data(vstTCGA_ACCData_sub)

#Examine only the first few columns and rows because the dataset is large (20501 genes x 10 samples):
dim(vstTCGA_ACCData_sub)

## [1] 20501    10

#Note that sample IDs are column names and HGNC gene IDs (http://www.genenames.org) are rownames and that vst data is numeric.
str(vstTCGA_ACCData_sub)

## 'data.frame':    20501 obs. of  10 variables:
##  $ TCGA.OR.A5J1: num  5.87 4.19 5.92 8.43 6.99 ...
##  $ TCGA.OR.A5J2: num  5.49 4.19 5.2 8.74 4.19 ...
##  $ TCGA.OR.A5J3: num  6.04 4.52 5.44 8.04 4.76 ...
##  $ TCGA.OR.A5J5: num  11.4 4.71 5.22 7.08 6.8 ...
##  $ TCGA.OR.A5J6: num  10.07 4.19 5.11 8.8 4.66 ...
##  $ TCGA.OR.A5J7: num  5.57 4.19 4.96 7.52 4.91 ...
##  $ TCGA.OR.A5J8: num  6.86 4.19 4.19 6.91 5.1 ...
##  $ TCGA.OR.A5J9: num  5.4 4.19 6.46 8.94 6.34 ...
##  $ TCGA.OR.A5JA: num  6.8 4.19 5.25 8.77 6.36 ...
##  $ TCGA.OR.A5JB: num  8.53 4.19 4.19 6.84 4.19 ...

knitr::kable(vstTCGA_ACCData_sub[1:5,1:5])

	TCGA.OR.A5J1	TCGA.OR.A5J2	TCGA.OR.A5J3	TCGA.OR.A5J5	TCGA.OR.A5J6
A1BG	5.871339	5.490145	6.036080	11.397348	10.065106
A1CF	4.190503	4.190503	4.523434	4.713955	4.190503
A2BP1	5.915039	5.196520	5.443088	5.221104	5.112238
A2LD1	8.431843	8.741279	8.043286	7.075708	8.798831
A2ML1	6.986670	4.190503	4.764641	6.798125	4.657211

readDataForPI function

Functions in the ProliferativeIndex package come with help pages that can be accessed as usual (for example, ?readDataForPI).

The function readDataForPI is used to read data in for use with the ProliferativeIndex package.

#Inputs are the user's vst dataframe and a model of interest for examining PI:
exampleTCGAData<-readDataForPI(vstTCGA_ACCData_sub, c("AIFM3", "ATP9B", "CTRC", "MCL1", "MGAT4B", "ODF2L", "SNORA65", "TPPP2"))

#examine output which is a list of two objects:
# exampleTCGAData$vstData is the user's vst dataframe and exampleTCGAData$modelIDs is a character string of the user's gene IDs for their model of interest
str(exampleTCGAData)

## List of 2
##  $ vstData :'data.frame':    20501 obs. of  10 variables:
##   ..$ TCGA.OR.A5J1: num [1:20501] 5.87 4.19 5.92 8.43 6.99 ...
##   ..$ TCGA.OR.A5J2: num [1:20501] 5.49 4.19 5.2 8.74 4.19 ...
##   ..$ TCGA.OR.A5J3: num [1:20501] 6.04 4.52 5.44 8.04 4.76 ...
##   ..$ TCGA.OR.A5J5: num [1:20501] 11.4 4.71 5.22 7.08 6.8 ...
##   ..$ TCGA.OR.A5J6: num [1:20501] 10.07 4.19 5.11 8.8 4.66 ...
##   ..$ TCGA.OR.A5J7: num [1:20501] 5.57 4.19 4.96 7.52 4.91 ...
##   ..$ TCGA.OR.A5J8: num [1:20501] 6.86 4.19 4.19 6.91 5.1 ...
##   ..$ TCGA.OR.A5J9: num [1:20501] 5.4 4.19 6.46 8.94 6.34 ...
##   ..$ TCGA.OR.A5JA: num [1:20501] 6.8 4.19 5.25 8.77 6.36 ...
##   ..$ TCGA.OR.A5JB: num [1:20501] 8.53 4.19 4.19 6.84 4.19 ...
##  $ modelIDs: chr [1:8] "AIFM3" "ATP9B" "CTRC" "MCL1" ...

*note, the R package includes a data object, ‘exReadDataObj’ that is the output from the readDataForPI function for comparison

calculatePI function

The function calculatePI calculates PI for all sample’s in the users vst dataframe using a list of PCNA-associated genes collected from Venet et al. (including alternative gene names).

*note, the function will print to the screen how many genes used to calculate the PI were found in the vstData

proliferativeIndices<-calculatePI(exampleTCGAData)

## [1] "vstData contained 131/131 of the PI-associated genes"

summary(proliferativeIndices)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   7.454   8.480   9.220   9.246  10.016  10.556

*note, the R package includes a data object, ‘exVSTPI’ that is the output from the calculatePI function for comparison

comparePI function

This function will summarize the PI values within the user’s dataset.

Min. 1st Qu. Median Mean 3rd Qu. Max. 7.454 8.480 9.220 9.246 10.016 10.556 *note, the R package includes a data object, ‘exVSTPI’ that is the output from the calculatePI function for comparison

compareModeltoPI function

The function compareModeltoPI will take, as input, the user’s data and model identifiers and compare to PI:

modelComparison<-compareModeltoPI(exampleTCGAData, proliferativeIndices)

#the output is a table, inspect:
knitr::kable(modelComparison)

	SpearmanRho	SpearmanPvalue	PCAPropOfVariance
PC1	0.9878788	0.0000000	0.51527
PC2	0.0181818	0.9728412	0.11587
PC3	-0.0909091	0.8114170	0.07491
PC4	0.1151515	0.7588331	0.06558
PC5	0.1757576	0.6319674	0.05897
PC6	-0.0424242	0.9186333	0.05068
PC7	0.0424242	0.9186333	0.05002
PC8	-0.0909091	0.8114170	0.03992
PC9	-0.0424242	0.9186333	0.02878
PC10	-0.3696970	0.2956041	0.00000

Ramaker and Lasseigne, et al. bioRxiv, 2016.↩
Venet, et al. PLoS Computational Biology, 2011 and Ge, et al. Genomics, 2005.↩
The TCGA ACC dataset was obtained from the TCGA data portal (tcga-data.nci.nih.gov) in June 2015. Level 3 RNASeqV2 raw count data was variance stabalized with the DESeq2 v1.8.2 ‘varianceStabilizingTransformation’.↩