R package available on CRAN or on github
Maintainer: Paul Deveau (paul.deveau at curie.fr)
QuantumClone is an algorithm that is designed to reconstruct clonal populations (i.e. group of cells with the same genetic background) based on high throughput sequencing data (either whole exome or whole genome) It takes into account information from variants (reads supporting the alternative allele and depth at the position), as well as information from copy number : number of alleles at locus (a normal diploid region would be written as “AB”) Additional information, such as the contamination, is also used.
QuantumClone is looking for clones in your samples assuming that there is an evolutionary logic between samples, so you should use data from the same patient for one analysis (either different timepoints, or spatially separated samples, or biological replicates).
QuantumClone requires few informations in the input file:
Line 1 should be the column titles (Sample | Chr | Start | Alt | Depth ). An additional argument is required if you do not have a FREEC profile associated to your files: the Genotype.
The first column needs to be the name of your sample
The Chr column contains the chromosome of variant (e.g. “chr2”)
Start is the position of the variant
Alt is the number of reads supporting the variant
Depth is the depth of coverage at the position of the variant (number of reads mapped at this position)
We show below an example created by the QuantumCat function, and that can be accessed from the data:
# Example was generated calling:
Input_Example<-QuantumCat(number_of_clones = 4,
number_of_mutations = 100,
ploidy = "AB",depth = 150,
number_of_samples = 2,
contamination = c(0,0))
SampleName | Chr | Start | Depth | Alt | Genotype |
---|---|---|---|---|---|
Timepoint_1 | 1 | 1 | 149 | 67 | AB |
Timepoint_1 | 4 | 2 | 162 | 2 | AB |
Timepoint_1 | 4 | 3 | 132 | 5 | AB |
Timepoint_1 | 4 | 4 | 57 | 1 | AB |
Timepoint_1 | 4 | 5 | 93 | 0 | AB |
Timepoint_1 | 4 | 6 | 95 | 0 | AB |
Any additional column will not be taken into account for the analysis
While the input file can be as large as you want, the computation time will exponentially grow with the number of variants to be studied. In order to keep computation time reasonable (from a minute to an hour), a reasonable set of mutation is between 100 to 1000 variants.
The QuantumClone package is divided in two:
The clonal reconstruction: QuantumClone / One_step_clustering functions
The clonal simulation: QuantumCat (not included in the GUI)
One_step_clustering() has several parameters required (some have default configuration):
One_step_clustering(SNV_list = Input_Example, FREEC_list = NULL, contamination = c(0,0),
nclone_range = 2:5, clone_priors = NULL, prior_weight = NULL,
Initializations = 1 , preclustering = "Flash",
simulated = FALSE, epsilon = 5 * (10^(-3)),
save_plot = TRUE, ncores = 1,
restrict.to.AB = FALSE, output_directory = NULL)
The output should look like this: > QC_output$filtered.data[[1]]
Chr | Start | Cellularity | Genotype | Alt | Depth | NC | NCh | alpha | id |
---|---|---|---|---|---|---|---|---|---|
1 | 1 | 0.8993289 | AB | 67 | 149 | 1 | 2 | 1 | 1 |
4 | 2 | 0.0246914 | AB | 2 | 162 | 1 | 2 | 1 | 2 |
4 | 3 | 0.0757576 | AB | 5 | 132 | 1 | 2 | 1 | 3 |
4 | 4 | 0.0350877 | AB | 1 | 57 | 1 | 2 | 1 | 4 |
4 | 5 | 0.0000000 | AB | 0 | 93 | 1 | 2 | 1 | 5 |
4 | 6 | 0.0000000 | AB | 0 | 95 | 1 | 2 | 1 | 6 |
Output from clustering can be represented thanks to the plot_QC_out(), or plot_with_margins_densities (if 1 or two samples)
plot_QC_out(QC_output)
## Only one model identified.
## Two samples identified.
If more than one sample (here we reuse sample 1 in the plot, to illustrate plot possibilities, not clustering): > plot_QC_out(QC_output3s, Sample_names=c(“Diag”,“Rel”,“Metastasis”), simulated = FALSE,sample_selected = 1:3))
## Only one model identified.
plot_with_margins_densities(QC_output)
For time series, using evolution_plot() is recommanded. It enables the plot of the cellularity of each clone in a single plot, with the width of a line being proportional to the fraction of mutations in the clone.
evolution_plot(QC_output,Sample_names = c(“Timepoint_1”,“Timepoint_2”))
Cellularities<-cbind(QuantumClone::QC_output$EM.output$centers[[1]],QuantumClone::QC_output$EM.output$centers[[2]])
Tree<-QuantumClone::Tree_generation(Cellularities)
Output of Tree_generation is a list of dataframes and probabilities, as this:
0 | 1 | 0 | 0 | 1 | 0 | 0 | 1.0000000 | 1.0000000 |
0 | 0 | 0 | 1 | 0 | 0 | 1 | 0.7148588 | 0.2166373 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0502218 | 0.5129098 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0208175 | 0.0186199 |
0 | 0 | 1 | 0 | 0 | 1 | 0 | 0.2851412 | 0.7833627 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.2349194 | 0.2704529 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.6940413 | 0.1980175 |
Each row (i) corresponds to a clone. The last two columns are the cellularity of the clone in the sample. For the other columns (j) there is a 1 if clone j is a progeny of clone (i)
QuantumClone::multiplot_trees(Tree,d = 4)
This part is about generating data to test clonal reconstruction algorithms. Its core is the QuantumCat function. It will generate data for a single cancer that can be sequenced multiple times (either spatially separated or different timepoints). It thus assumes that there is an evolutionary history between samples. The “Chr” columns stores the information of the clonal attribution.
QuantumCat(number_of_clones, number_of_mutations, ploidy = 2, depth = 100, number_of_samples = 2, Random_clones = F, contamination = NULL)
For multiple testings, and calculation of the Normalized Mutual Information (NMI), see Multitest() and statistics_on_Multitest()
Many thanks to the contributors of this work: my supervisors, Elodie for the features improvement and Linux debugging, Matahi for the OSX feedback, and more generally to the U830 & U900 people. This work had been funded by the Ministere de l’Enseignement Superieur de la Recherche (AMX grant).