Configuration files in BioInstaller are important. We used these configuration files to stored the software and databases URL, the script of installation, and other useful information.
Most of the configuration files are parsed by configr. Compared with original configr package syntax #R# R CMD #R#
is a different point. It can be used to mark those R format command.
github.toml, nongithub.toml and db.toml
Built-in configuration files: github.toml
, nongithub.toml
and db.toml (db_annovar.toml/db_main.toml, nongithub.toml format)
can be used to download and install several software and database. install.bioinfo(show.all.names = TRUE)
can be used to get all of avaliable softwares and databases existed in github.toml and nongithub.toml.
Softwares and databases deposited on Github
Variables to control the download and installation steps of software and databases deposited on github:
- If you set
use_git2r
tofalse
, BioInstaller will use the git of your system. - If you set
use_git2r
tofalse
and settedrecursive_clone
totrue
, BioInstaller will run this commandgit clone --recursive https://path/repo
- You can use the
before_install
stored the pre-installation steps - The
install
mainly be used to store the installation steps. Besides, you can use your own installation script and setted it to#R# system('/path/yourscript')#R#
- The
make_dir
is the compile directory of software and database. Because the workdir of R default will be set todownload.dir
, and need be changed tomake_dir
finishinstall
steps. - All Bitbucket repo should set ‘bitbucket’ to ‘true’ (e.g. snakemake in github.toml)
[bwa]
github_url = "https://github.com/lh3/bwa"
after_failure = "echo 'fail!'"
after_success = "echo 'successful!'"
make_dir = ["./"]
bin_dir = ["./"]
[bwa.before_install]
linux = ""
mac = ""
[bwa.install]
linux = "make"
mac = "make"
Github software version control can be done by git2r
package and github tag API. Source URL of software or files deposited in github can be found by github_url
in github.toml
.
Softwares and databases of non-github
Variables to control the download and installation steps of software and databases not be deposited on github:
github_url
be replaced bysource_url
- If you want to download multiple files in the source_url, you need to set
url_all_download
totrue
. - rvest and RCurl packages can be used to parse the version information of software or databases of non-github.
- If you don’t want to use the built-in version reorder function, you need to set
version_order_fixed
totrue
. Optional, if the file count of source code was only one, you can seturl_all_download
tofalse
and writing multiple URL. It will help you to avoid the invalid URL caused download fail.
[gmap]
# {{version}} will be parsed to your install.bioinfo `version` parameter
# or the newest version parsed from fetched data.
source_url = "http://research-pub.gene.com/gmap/src/{{version}}.tar.gz"
after_failure = "echo 'fail!'"
after_success = "echo 'successful!'"
make_dir = ["./"]
bin_dir = ["./"]
[gmap.before_install]
linux = ""
mac = ""
[gmap.install]
linux = "./configure --prefix=`pwd` && make && make install"
mac = ["sed -i s/\"## CFLAGS='-O3 -m64' .*\"/\"CFLAGS='-O3 -m64'\"/ config.site",
"./configure --prefix=`pwd` && make && make install"]
Version control of non-github software and databases need a function parsing URL and use {{version}}
to replace in the source_url
.
Besides, BioInstaller uses configr glue
to reduce the length of files name. It can help you to use less word to store more files name.
library(configr)
library(BioInstaller)
blast.databases <- system.file('extdata',
'config/db/db_blast.toml', package = 'BioInstaller')
read.config(blast.databases)$db_blast_nr$source_url
#> [1] "!!glue ftp://ftp.ncbi.nih.gov/blast/db/nr.{ids=sprintf('%02d', 0:68);rep(ids, 2)}.tar.gz{c(rep('', length(ids)), rep('.md5', length(ids)))}"
x <- read.config(blast.databases, glue.parse = TRUE)$db_blast_nr$source_url
length(x)
#> [1] 138
head(x)
#> [1] "ftp://ftp.ncbi.nih.gov/blast/db/nr.00.tar.gz"
#> [2] "ftp://ftp.ncbi.nih.gov/blast/db/nr.01.tar.gz"
#> [3] "ftp://ftp.ncbi.nih.gov/blast/db/nr.02.tar.gz"
#> [4] "ftp://ftp.ncbi.nih.gov/blast/db/nr.03.tar.gz"
#> [5] "ftp://ftp.ncbi.nih.gov/blast/db/nr.04.tar.gz"
#> [6] "ftp://ftp.ncbi.nih.gov/blast/db/nr.05.tar.gz"
mask.github <- tempfile()
file.create(mask.github)
#> [1] TRUE
install.bioinfo(nongithub.cfg = blast.databases, github.cfg = mask.github,
show.all.names = TRUE)
#> Warning in fetch.config(github.cfg): Configuration file /var/folders/nc/
#> yl5qhkkn6vxf_m7s_yz2kzvh0000gn/T//RtmpK8vw9W/filebbb14fb80eba is empty,
#> please check the links.
#> [1] "db_blast_env_nr" "db_blast_est_human"
#> [3] "db_blast_est_mouse" "db_blast_est_others"
#> [5] "db_blast_gss" "db_blast_htgs"
#> [7] "db_blast_human_genomic" "db_blast_landmark"
#> [9] "db_blast_mouse_genomic" "db_blast_nr"
#> [11] "db_blast_nt" "db_blast_other_genomic"
#> [13] "db_blast_pataa" "db_blast_patnt"
#> [15] "db_blast_pdbaa" "db_blast_pdbnt"
#> [17] "db_blast_ref_prok_rep_genomes" "db_blast_ref_viroids_rep_genomes"
#> [19] "db_blast_ref_viruses_rep_genomes" "db_blast_refseq_genomic"
#> [21] "db_blast_refseq_protein" "db_blast_refseq_rna"
#> [23] "db_blast_refseqgene" "db_blast_sts"
#> [25] "db_blast_swissprot" "db_blast_taxdb"
#> [27] "db_blast_tsa_nr" "db_blast_tsa_nt"
#> [29] "db_blast_vector"
Reading from BIO_SOFTWARES_DB_ACTIVE database
To resolve some software dependence, BioInstaller using the {{key:value}}
format expression, and get its value from BBIO_SOFWARES_DB_ACTIVE
database.
For example, htslib
is the dependence of Pindel, and we use ./INSTALL {{htslib:source.dir}}
as the install step of Pindel. In the session of R, the value of {{htslib:source.dir}}
will be replaced by the real value stored in BIO_SOFTWARES_DB_ACTIVE
or db
in install.bioinfo
function.
Parsing from install.bioinfo
parameter extra.list
To improve the flexibility of configuration templet, BioInstall using the {{parameters}}
format expression to get the function install.bioinfo
parameter extra.list
. Noteably, the name
, version
, os.version
, destdir
were default pass to extra.list
.
For example, source_url of GMAP
need the version
value, and we use source_url = "http://research-pub.gene.com/gmap/src/{{version}}.tar.gz"
as the download URL. In the session of R, the {{version}}
will be replaced by the version
parameter value of install.bioinfo
(if the version were NULL
, it will be set to be the newest version).