gmat

An R package for simulating correlation matrices possibly constrained by acyclic directed and undirected graphs.

Build Status codecov Lifecycle: maturing CRAN downloads CRAN status

Installation

The package is available on CRAN, to get the latest stable version use:

install.packages("gmat")

Alternatively, using the R package devtools one may install the development version:

# install.packages("devtools")
devtools::install_github("irenecrsn/gmat")

The other R packages required for gmat are igraph and gRbase, which can also be installed from CRAN.

Overview

The package mostly implements methods described in the following papers:

An example of use

First, we generate a random undirected graph with 3 nodes and density 0.5. Then we generate, using our port() function, 2 correlation matrices consistent with such random graphical structure.

ug <- gmat::rgraph(p = 3, d = 0.5)
igraph::print.igraph(ug)
#> IGRAPH c54f1ce U--- 3 2 -- Erdos renyi (gnp) graph
#> + attr: name (g/c), type (g/c), loops (g/l), p (g/n)
#> + edges from c54f1ce:
#> [1] 1--2 1--3
gmat::port(N = 2, ug = ug)
#> , , 1
#> 
#>            [,1]       [,2]       [,3]
#> [1,]  1.0000000 -0.3084469 -0.8615382
#> [2,] -0.3084469  1.0000000  0.0000000
#> [3,] -0.8615382  0.0000000  1.0000000
#> 
#> , , 2
#> 
#>            [,1]       [,2]      [,3]
#> [1,]  1.0000000 -0.7442018 0.4804328
#> [2,] -0.7442018  1.0000000 0.0000000
#> [3,]  0.4804328  0.0000000 1.0000000

We appreciate how the zero pattern is shared by all of the simulated matrices. The return value is an array, and so the individual matrices can be accessed as matrices[, , n], where n is the index of the matrix we want to retrieve from the sample, ranging from 1 to N.

We may also sample correlation matrices using i.i.d. coefficients in their upper Cholesky factor U.

gmat::chol_iid(N = 2)
#> , , 1
#> 
#>            [,1]       [,2]       [,3]
#> [1,]  1.0000000 -0.5736727 -0.3748520
#> [2,] -0.5736727  1.0000000 -0.1880879
#> [3,] -0.3748520 -0.1880879  1.0000000
#> 
#> , , 2
#> 
#>            [,1]       [,2]       [,3]
#> [1,]  1.0000000 -0.2140087 -0.1489077
#> [2,] -0.2140087  1.0000000 -0.1502781
#> [3,] -0.1489077 -0.1502781  1.0000000

A specific zero pattern can be enforced in U using an acyclic digraph.

dag <- gmat::rgraph(p = 3, d = 0.5, dag = TRUE)
igraph::print.igraph(dag)
#> IGRAPH 390731a D--- 3 1 -- 
#> + edge from 390731a:
#> [1] 1->2
top_sort <- igraph::topo_sort(dag)
peo_sort <- rev(top_sort)
inv_top <- order(top_sort)
inv_peo <- order(peo_sort)
m <- gmat::chol_iid(dag = dag)[, , 1]
# We sort the matrix according to both a topological and a perfect elimination
# ordering
m_top_sort <- m[top_sort, top_sort]
m_peo_sort <- m[peo_sort, peo_sort]
# Transpose of Cholesky factor (lower Cholesky decomposition)
U_chol <- chol(m_peo_sort) 
# Upper Cholesky factor (upper Cholesky decomposition)
U <- gmat::uchol(m_top_sort)
print(U_chol[inv_peo, inv_peo])
#>            [,1] [,2] [,3]
#> [1,]  0.8836329    0    0
#> [2,] -0.4681803    1    0
#> [3,]  0.0000000    0    1
print(U[inv_top, inv_top])
#>           [,1]       [,2] [,3]
#> [1,] 0.8836329 -0.4681803    0
#> [2,] 0.0000000  1.0000000    0
#> [3,] 0.0000000  0.0000000    1
print(m)
#>            [,1]       [,2] [,3]
#> [1,]  1.0000000 -0.4681803    0
#> [2,] -0.4681803  1.0000000    0
#> [3,]  0.0000000  0.0000000    1

The zeros are correctly reflected in the upper Cholesky factor when using a topological sorting, whereas they are reflected in the standard lower Cholesky decomposition factor if we sort the matrix with a perfect elimination ordering.

See more examples and paper references at the documentation website for the package.