-vcf [file]
File path of the isolate vcf. Assume all variants are PASS
in the QUAL
column, the VCF file also reqires the AD
field.
Note: In the current implementation, DEploid
only take the first sample in the VCF file. DEploid
DO NOT handle multi-allelic variants, nor indels. The FILTER
column will not be used.
-plaf [file]
File path of population level allele frequencies (tab-delimited plain text file), for example
CHROM | POS | PLAF |
---|---|---|
Pf3D7_01_v3 | 93157 | 0.0190612159917058 |
Pf3D7_01_v3 | 94422 | 0.135502358766423 |
Pf3D7_01_v3 | 94459 | 0.156294363760064 |
Pf3D7_01_v3 | 94487 | 0.143439298925837 |
-panel [file]
File path of the reference panel (tab-delimited plain text file), for example
CHROM | POS | 3D7 | Dd2 | Hb3 | 7G8 |
---|---|---|---|---|---|
Pf3D7_01_v3 | 93157 | 0 | 0 | 0 | 1 |
Pf3D7_01_v3 | 94422 | 0 | 0 | 0 | 1 |
Pf3D7_01_v3 | 94459 | 0 | 0 | 0 | 1 |
Pf3D7_01_v3 | 94487 | 0 | 0 | 0 | 1 |
-noPanel
Use population level allele frequency as prior.
Warning: Flags -panel
and -noPanel
should not be used together.
‘-exclude [file]’ File path of sites to be excluded (tab-delimited plain text file).
‘-o [string]’ Specify the file name prefix of the output.
‘-k [int]’ Number of strain (default value 5).
‘-nSample [int]’ Number of MCMC samples (default value 800).
‘-rate [int]’ MCMC sample rate (default value 5).
‘-burn [float]’ MCMC burn rate (default value 0.5).
‘-v , -version’ DEploid version.
‘-vcfOut’ Save final halpotypes into a VCF file.
‘-ref [file] -alt [file]’ File path of reference and alternative allele count (tab-delimited plain text file).
Note: In early dEploid
versions (prior to v0.2-release
), allele counts extracted from the vcf file are placed in two files, and parsed by flags -ref [file]
and -alt [file]
. Tab-delimited plain text for input. First and second columns record chromosome and position labels respectively. Third columns records the reference allele count or alternative allele count. For example,
CHROM | POS | PG0390.C |
---|---|---|
Pf3D7_01_v3 | 93157 | 85 |
Pf3D7_01_v3 | 94422 | 77 |
Pf3D7_01_v3 | 94459 | 90 |
Pf3D7_01_v3 | 94487 | 79 |
CHROM | POS | PG0390.C |
---|---|---|
Pf3D7_01_v3 | 93157 | 0 |
Pf3D7_01_v3 | 94422 | 0 |
Pf3D7_01_v3 | 94459 | 0 |
Pf3D7_01_v3 | 94487 | 0 |
Warning: Flags -ref
and -alt
should not be used with -vcf
.
-forbidUpdateProp
Forbid MCMC moves to update proportions.
-forbidUpdateSingle
Forbid MCMC moves to update single haplotype.
-forbidUpdatePair
Forbid MCMC moves to update pair haplotypes.
-exportPostProb
Save the posterior probabilities of the final iteration of all strains.
-miss [float]
Miss copying probability
-recomb [float]
Constant recombination probability
-initialP [float ...]
Initialize proportions.
-p [int]
Output precision (default value 8).
Data exploration, plot the read count ALT
vs REF
.
library(DEploid)
vcfFile = system.file("extdata", "PG0390-C.test.vcf.gz", package = "DEploid")
PG0390 = extractCoverageFromVcf(vcfFile)
plotAltVsRef( PG0390$refCount, PG0390$altCount )
Plot the histogram of the observed allele frequency within sample.
obsWSAF = computeObsWSAF( PG0390$altCount, PG0390$refCount )
histWSAF(obsWSAF)
Load prior information: PLAF and reference panel
plafFile = system.file("extdata", "labStrains.test.PLAF.txt", package = "DEploid")
plaf = extractPLAF(plafFile)
panelFile = system.file("extdata", "labStrains.test.panel.txt", package = "DEploid")
Deconvolute the haplotypes
set.seed(1)
PG0390.deconv = dEploid(paste("-vcf", vcfFile, "-plaf", plafFile, "-panel", panelFile))
prop = PG0390.deconv$Proportions[dim(PG0390.deconv$Proportions)[1],]
expWSAF = t(PG0390.deconv$Haps) %*% prop
Plot the allele frequency within sample (observed in red, expected in blue) against the population level allele frequency.
plotWSAFvsPLAF(plaf, obsWSAF, expWSAF)
Plot the history of the MCMC proportion estimates.
plotProportions(PG0390.deconv$Proportions)
Plot the allele frequency within sample, observed against expected.
plotObsExpWSAF(obsWSAF, expWSAF)
If you encounter any problem when using dEploid
, please file a short bug report by using the issue tracker on GitHub or email joe.zhu (at) well.ox.ac.uk.
Please include the output of dEploid -v
and the platform you are using dEploid
on in the report. If the problem occurs while executing dEploid
, please also include the command you are using and the random seed.
Thank you!
If you use dEploid
with the flag -ibd
, please cite the following paper:
Zhu, J. S., J. A. Hendry, J. Almagro-Garcia, R. D. Pearson, R. Amato, A. Miles, D. J. Weiss, T. C. D. Lucas, M. Nguyen, P. W. Gething, D. Kwiatkowski, G. McVean, and for the Pf3k Project. (2018) The origins and relatedness structure of mixed infections vary with local prevalence of P. falciparum malaria. eLife, 40845, doi: https://doi.org/10.7554/eLife.40845.
If you use dEploid
in your work, please cite the program:
Zhu, J. S., J. A. Garcia, G. McVean. (2018) Deconvolution of multiple infections in Plasmodium falciparum from high throughput sequencing data. Bioinformatics 34(1), 9-15. doi: https://doi.org/10.1093/bioinformatics/btx530.
Bibtex record::
@article {Zhu387266, author = {Zhu, Sha Joe and Hendry, Jason A. and Almagro-Garcia, Jacob and Pearson, Richard D. and Amato, Roberto and Miles, Alistair and Weiss, Daniel J. and Lucas, Tim C.D. and Nguyen, Michele and Gething, Peter W. and Kwiatkowski, Dominic and McVean, Gil and ,}, title = {The origins and relatedness structure of mixed infections vary with local prevalence of P. falciparum malaria}, year = {2018}, doi = {10.1101/387266}, publisher = {Cold Spring Harbor Laboratory}, URL = {https://www.biorxiv.org/content/early/2018/08/09/387266}, eprint = {https://www.biorxiv.org/content/early/2018/08/09/387266.full.pdf}, journal = {bioRxiv} }
@article {Zhubtx530, author = {Zhu, Sha Joe and Almagro-Garcia, Jacob and McVean, Gil}, title = {Deconvolution of multiple infections in {{}} from high throughput sequencing data}, year = {2017}, doi = {10.1093/bioinformatics/btx530}, URL = {https://doi.org/10.1093/bioinformatics/btx530}, journal = {Bioinformatics} }