Network Scores

Alexander P. Christensen

2020-07-12

Introduction

This vignette shows you how to compute network scores using the state-of-the-art psychometric network algorithms in R. Building on the example proposed by Christensen (2018), this algorithm is now more efficient and precise given recent findings in the psychometric network literature. The vignette will walkthrough an example that compares latent network scores to latent variable scores computed by confirmatory factor analysis (CFA) and will explain the similarities and differences between the two.

To get started, a few packages need to be installed (if you don’t have them already) and loaded.

# Install 'NetworkToolbox' package
install.packages("NetworkToolbox")
# Install 'lavaan' package
install.packages("lavaan")
# Load packages
library(EGAnet)
library(NetworkToolbox)
library(lavaan)

Estimate Dimensions

To estimate dimensions, we’ll use exploratory graph analysis (EGA; Golino, 2019; Golino & Epskamp, 2017; Golino et al., 2018). EGA first computes a network using either the graphical least absolute selection and shrinkage operator (GLASSO; Friedman, Hastie, & Tibshirani, 2008) with extended Bayesian information criterion (EBIC) from the R package qgraph (GeLASSO; Epskamp & Fried, 2018) or the triangulated maximally filtered graph (TMFG; Massara et al., 2016) from the NetworkToolbox package (Christensen, 2018). EGA then applies the walktrap community detection algorithm (Pons & Latapy, 2006) from the igraph package (Csardi & Nepusz, 2006). Below is the code to estimate EGA using the GeLASSO method with the NEO PI-3 openness to experience data (n = 802) in the NetworkToolbox package.

# Run EGA
ega <- EGA(neoOpen, model = "glasso", algorithm = "louvain")

As depicted above, EGA estimates there to be 7 dimensions of openness to experience. Using these dimensions, we can now estimate network scores; but first, I’ll go into details about how these are estimated.

Network Loadings

Christensen & Golino (under review) and Golino, Moulder, Christensen, Kim, & Boker (in prep)

Network loadings are roughly equivalent to factor loadings and differ only in the association measures used to compute them. For networks, the centrality measure node strength is used to compute the sum of the connections to a node. Previous simulation studies have reported that node strength is generally redundant with CFA factor loadings (Hallquist, Wright, & Molenaar, 2019) and item-scale correlations (Christensen, Golino, & Silvia, 2019). Importantly, Hallquist and colleagues (2019) found that a node’s strength represents a combination of dominant and cross-factor loadings. To mitigate this issue, I’ve developed a function called net.loads, which computes the node strength for each node in each dimension, parsing out the connections that represent dominant and cross-dimension loadings. Below is the code to compute standardized ($std; unstandardized, $unstd) network loadings.

# Standardized
n.loads <- net.loads(A = ega)$std

To provide mathematical notation, Let \(W\) represent a symmetric \(m \times m\) matrix where \(m\) is the number of terms. Node strength is then defined as:

\[NS_i = \sum^m_j w_{ij},\]

where \(w_{ij}\) is the weight (e.g., partial correlation) between node \(i\) and node \(j\), and \(NS_i\) is the sum of the weights between node \(i\) and all other nodes. Using the definition of node strength (), we can define node strength split between the communities, \(C\), identified by EGA:

\[NL_{ic} = \sum^C_{j \in c} w_{ij},\]

where \(w_{ij}\) is the weight of node \(i\) with the subset of nodes \(j\) that belong to community \(c\) (i.e., \(j \in c\)), and \(NL_{ic}\) is the sum of the weights for node \(i\) in community \(c\) (or unstandardized network loading for node \(i\) in community \(c\)). From the unstandardized network loadings (), standardized network loadings follow with:

\[z_{NL_{ic}} = \frac{NL_{ic}}{\sqrt{\sum\limits^C_c NL_c}},\]

where \(NL_c\) is the sum of network loadings in community \(c\) and \(z_{NL_{ic}}\) is the standardized network loading of node \(i\) in community \(c\). It’s important to emphasize that network loadings are in the unit of association—that is, if the network consists of partial correlations, then the standardized network loadings are the partial correlation of each node within each dimension.

Network Scores

These network loadings form the foundation for computing network scores .Because the network loadings represent the middle ground between a saturated (EFA) and simple (CFA) structure, the network scores accomodate the inclusion of only the most important cross-loadings in their computation. This capitalizes on information often lost in typical CFA structures but reduces the cross-loadings of EFA structures to only the most important loadings.

To compute network scores, the following code can be used:

# Network scores
net.scores <- net.scores(data = neoOpen, A = ega)

The net.scores function will return three objects: scores, commCor, and loads. scores contain the network scores for each dimension and an overall score. commCor contains the partial correlations between the dimensions in the network (and with the overall score). Finally, loads will return the standardized network loadings described above. Below we detail the mathematical notation for computing network scores.

First, we take each community and identify items that do not have loadings on that community equal to zero:

\[z_{tc} = z_{NL_{i \in c}} \neq 0,\]

where \(z_{NL_{c}}\) is the standardized network loadings for community \(c\), and \(z_{tc}\) is the network loadings in community \(c\),that are not equal to zero. Next, \(z_{tc}\) is divided by the standard deviation of the corresponding items in the data, \(X\):

\[wei_{tc} = \frac{z_{tc}}{\sqrt{\frac{\sum_{i=1}^{t \in c} (X_i - \bar{X_t})^2}{n - 1}}},\]

where the denominator, \(\sqrt{\frac{\sum_{i=1}^{t \in c} (X_i - \bar{X_t})^2}{n - 1}}\), corresponds to the standard deviation of the items with non-zero network loadings in community \(c\), and \(wei_{tc}\) is the weight for the non-zero loadings in community \(c\). These can be further transformed into relative weights for each non-zero loading:

\[relWei_{tc} = \frac{wei_{t \in c}}{\sum^C_c wei_{t \in c}},\]

where \(\sum^C_c wei_{t \in c}\) is the sum of the weights in community \(c\), and \(relWei_{tc}\) is the relative weights for non-zero loadings in community \(c\). We then take these relative weights and multiply them to the corresponding items in the data, \(X_{t \in c}\), to obtain the community (i.e., topic) score:

\[\hat{\theta_c} = \sum\limits^C_{c} X_{t \in c} \times relWei_{t \in c},\]

where \(\hat{\theta_c}\) is the network score for community \(c\).

Comparison to CFA Scores

It’s important to note that CFA scores are typically computed using a simple structure (items only load on one factor) and regression techniques. Network scores, however, are computed using a complex structure and are a weighted composite rather than a latent factor.

# Latent variable scores
## Estimate CFA
cfa.net <- CFA(ega, estimator = "WLSMV", data = neoOpen)

## Compute latent variable scores
lv.scores <- lavPredict(cfa.net$fit)

## Initialize correlations vector
cors <- numeric(ega$n.dim)

## Compute correlations
for(i in 1:ega$n.dim)
{cors[i] <- cor(net.scores$std.scores[,i], lv.scores[,i], method = "spearman")}
Correlations
Factor 1 0.989
Factor 2 0.990
Factor 3 0.987
Factor 4 0.983
Factor 5 0.984
Factor 6 0.983

As shown in the table, the network scores strongly correlate with the latent variable scores. Because Spearman’s correlation was used, the orderings of the values take precendence. These large correlations between the scores reflect considerable redundancy between these scores.

References

Christensen, A. P. (2018). NetworkToolbox: Methods and measures for brain, cognitive, and psychometric network analysis in R. The R Journal, 10, 422-439. https://doi.org/10.32614/RJ-2018-065

Christensen, A. P., & Golino, H. (under review). Statistical equivalency of factor and network loadings. PsyArXiv. https://doi.org/10.31234/osf.io/xakez

Christensen, A. P., Golino, H., & Silvia, P. (2019). A psychometric network perspective on the measurement and assessment of personality traits. PsyArXiv. https://doi.org/10.31234/osf.io/ktejp

Csardi, G., & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal Complex Systems, 1695, 1-9.

Epskamp, S., & Fried, E. I. (2018). A tutorial on regularized partial correlation networks. Psychological Methods, 23, 617-634. http://dx.doi.org/10.1037/met0000167

Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9, 432-441. https://doi.org/10.1093/biostatistics/kxm045

Golino, H., & Christensen, A. P. (2020). EGAnet: Exploratory graph analysis – A framework for estimating the number of dimensions in multivariate data using network psychometrics. R package version 0.9.4. https://CRAN.R-project.org/package=EGAnet

Golino, H., & Epskamp, S. (2017). Exploratory graph analysis: A new approach for estimating the number of dimensions in psychological research. PloS ONE, 12, e0174035. https://doi.org/10.1371/journal.pone.0174035

Golino, H., Moulder, R., Christensen, A. P., Kim, S., & Boker, S. M. (in prep). Modeling latent topics in social media using Dynamic Exploratory Graph Analysis: The case of the right-wing and left-wing trolls in the 2016 US elections.

Golino, H., Shi, D., Christensen, A. P., Nieto, M. D., Sadana, R., & Thiyagarajan, J. A. (2018). Investigating the performance of exploratory graph analysis and traditional techniques to identify the number of latent factors: A simulation and tutorial. PsyArXiv. https://psyarxiv.com/gzcre/

Hallquist, M., Wright, A. C. G., & Molenaar, P. C. (2019). Problems with centrality measures in psychopathology symptom networks: Why network psychometrics cannot escape psychometric theory. Multivariate Behavioral Research. https://psyarxiv.com/pg4mf

Massara, G. P., Di Matteo, T., & Aste, T. (2016). Network filtering for big data: Triangulated maximally filtered graph. Journal of Complex Networks, 5, 161-178. https://doi.org/10.1093/comnet/cnw015

Pons, P., & Latapy, M. (2006). Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications, 10, 191-218. https://doi.org/10.7155/jgaa.00124