# this vignette is not created if HiTC is not installed
if (!require("HiTC", quietly = TRUE)) {
  knitr::opts_chunk$set(eval = FALSE)
}
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, append,
##     as.data.frame, basename, cbind, colnames, dirname, do.call,
##     duplicated, eval, evalq, get, grep, grepl, intersect,
##     is.unsorted, lapply, mapply, match, mget, order, paste, pmax,
##     pmax.int, pmin, pmin.int, rank, rbind, rownames, sapply,
##     setdiff, sort, table, tapply, union, unique, unsplit, which,
##     which.max, which.min
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:base':
## 
##     expand.grid

Introduction

Hi-C is a sequencing-based molecular assay designed to measure intra and interchromosomal interactions between the DNA molecule. In particular, the identification of Topologically-Associated Domains (TADs), that is, of regions of the genome in which physical interactions are frequent, provides insight into the three-dimensional organization of a genome [1].

Hi-C data are in the form of two-dimensional contact maps, i.e., matrices whose \(i,j\) entry quantifies the intensity of the physical interaction between two genome regions \(i\) and \(j\) at the DNA level. In this vignette, we demonstrate the use of adjclust::hicClust to perform adjacency-constrained hierarchical agglomerative clustering (HAC) of Hi-C contact maps. The output of this function is a dendrogram, which can be cut to identify TADs. The algorithm used for adjacency-constrained (HAC) is described in the third chapter of [2].

library("adjclust")

Loading and displaying a sample Hi-C contact map

The data set hic_imr90_40_XX is an object of class HTCexp which has been obtained from the HiTC package [3]. It is a contact map corresponding to the first 500 x 500 bins on chromosome X vs chromosome X.

load(system.file("extdata", "hic_imr90_40_XX.rda", package = "adjclust"))

The script used to create this map can be found by executing the following command:

system.file("system/create_hic_chrXchrX.R", package="adjclust")

Now we have a look at the data.

HiTC::mapC(hic_imr90_40_XX)

Using hicClust

hicClust operates directly on objects of class HTCexp

fit <- hicClust(hic_imr90_40_XX)
## Note: 137 merges with non increasing heights.

It is also possible to work on binned data. Below we choose a bin size of \(5 \times 10^5\):

binned <- HiTC::binningC(hic_imr90_40_XX, binsize = 1e5)
HiTC::mapC(binned)

fitB <- hicClust(binned)
fitB
## 
## Call:
## run.adjclust(mat = mat, type = type, h = h)
## 
## Cluster method   : hicClust 
## Number of objects: 205

The output is of class chac. In particular, it can be plotted as a dendrogram silently relying on the function plot.dendrogram:

plot(fitB)
## Warning in plot.chac(fitB): 
## Detected reversals in dendrogram: mode = 'corrected', 'within-disp' or 'total-disp' might be more relevant.

Moreover, the output contains an element named merge which describes the successive merges of the clustering, and an element gains which gives the improvement in the criterion optimized by the clustering at each successive merge.

head(cbind(fitB$merge, fitB$gains))
##      [,1] [,2]
## [1,]   -3   -4
## [2,]   -2    1
## [3,]    2   -5
## [4,]   -1    3
## [5,]    4   -6
## [6,]  -17  -18

Other types of input

Contacts maps can also be stored as objects of class Matrix::dsCMatrix, or as plain text files. These types of input are also accepted as first argument to hicClust.

References

[1] Dixon J.R., et al (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature, 485(7398), 376.

[2] Dehman A. (2015). Spatial clustering of linkage disequilibrium blocks for genome-wide association studies. Phd Thesis, Université Paris Saclay.

[3] Servant N., et al (2012). HiTC: Exploration of High-Throughput ‘C’ experiments. Bioinformatics, 28(21), 2843-2844.

Session information

sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
## [1] adjclust_0.5.99      HiTC_1.30.0          GenomicRanges_1.38.0
## [4] GenomeInfoDb_1.22.0  IRanges_2.20.0       S4Vectors_0.24.0    
## [7] BiocGenerics_0.32.0 
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.4.6                compiler_3.6.3             
##  [3] RColorBrewer_1.1-2          XVector_0.26.0             
##  [5] bitops_1.0-6                tools_3.6.3                
##  [7] zlibbioc_1.32.0             digest_0.6.20              
##  [9] evaluate_0.14               lattice_0.20-41            
## [11] Matrix_1.2-18               DelayedArray_0.12.0        
## [13] yaml_2.2.0                  xfun_0.13                  
## [15] GenomeInfoDbData_1.2.2      rtracklayer_1.46.0         
## [17] stringr_1.4.0               knitr_1.23                 
## [19] Biostrings_2.54.0           grid_3.6.3                 
## [21] Biobase_2.46.0              XML_3.98-1.20              
## [23] BiocParallel_1.20.0         rmarkdown_1.14             
## [25] magrittr_1.5                Rsamtools_2.2.1            
## [27] htmltools_0.3.6             matrixStats_0.54.0         
## [29] GenomicAlignments_1.22.1    MASS_7.3-51.6              
## [31] SummarizedExperiment_1.16.0 capushe_1.1.1              
## [33] stringi_1.4.3               RCurl_1.95-4.12