For this example, we will work through a small gene expression meta-analysis of systemic lupus erythematosus (SLE). We have identified public datasets that we will download from GEO for this analysis.
All samples need to be assigned labels in the $class vector, 1 for ‘disease’ or 0 for ‘control’.
sleData$originalData$GSE50635 <- classFunction(sleData$originalData$GSE50635, column = "subject type:ch1",
diseaseTerms = c("Subject RBP +", "Subject RBP -"))
sleData$originalData$GSE11909_GPL96 <- classFunction(sleData$originalData$GSE11909_GPL96, column = "Illness:ch1",
diseaseTerms = c("SLE"))
sleData$originalData$GSE39088 <- classFunction(sleData$originalData$GSE39088, column= "disease state:ch1",
diseaseTerms=c("SLE"))
#Remove the GPL97 platform that was downloaded
sleData$originalData$GSE11909_GPL97 <- NULL
Set up criteria to filter genes for whether or not they will be included in the disease signature.
Once you have identified a gene signature, you can calculate a score for each sample based on the geometric mean of the up-regulated genes minus the geometric mean of the down-regulated genes. This score will be elevated in SLE patients compared to healthy controls.
This score can now be used to examine the results. Most functions call this score in the background.
We can visualize the effect sizes for all genes in the signature.
Receiver operating characteristic (ROC) curves and precision-recall (PRC) curves can be used to demonstrate the classification performance of the MetaScore.
Draw multiple ROC curves.
Draw multiple ROC curves with a summary ROC curve that represents an overall ROC estimate.
Draw multiple ROC curves with a pooled ROC curve that represents a moving average ROC.
Draw a single PRC plot.
Draw multiple PRC curves.
With a violin plot, you can drill into subgroups within datasets to observe differences between populations, with the individual samples called out.
For continuous variables, generate regression plot to analyze relationships.
sleMetaAnalysis$originalData$GSE50635$pheno$`age:ch1` <- as.numeric(sleMetaAnalysis$originalData$GSE50635$pheno$`age:ch1`)
regressionPlot(filterObject = sleMetaAnalysis$filterResults[[1]], datasetObject = sleMetaAnalysis$originalData$GSE50635, continuousVariableColumn = "age:ch1", formattedVariableName = "Age")
Forest plots allow us to examine individual genes across studies.
Use forward and backward search to reduce the number of genes in the signature and improve classification performance.
immunoStates is a tool for estimating immune cell proportions based on gene expression profiles. immunoStateMeta() in MetaIntegrator allow you to estimate cell proportions, then use these cell proportions as input for a downstream meta-analysis (in place of genes).
immunoStates can also correct the underlying gene expression data for differences in cell proportions
LINCS tools allows users to compare disease gene expression signatures to perturbation expression signatures identified by the LINCS consortium. lincsTools() will generate a broadly useful report of many different classes of molecules. The call to lincsCorrelate(), below, is one particular example of looking for a drug with a gene expression profile that reverses the SLE profile. Note that this requires downloading a significant amount of data, so the first execution will be slow.
Based on known marker genes, impute sex of samples. This can be useful for identifying sample labeling errors.
COCONUT is a separate R package for correcting batch effects to merge multiple datasets into a single dataset. This is a wrapper function to call COCONUT on a MetaIntegrator object.
Pathway analysis is commonly performed to provide biological interpretation for experiments. This is a wrapper function for deapathways, one R package for performing pathway analysis.
NOTE: This functionality will be added in future updates to MetaIntegrator