Vignette taken directly from Christensen & Kenett (under review)

With the semantic networks estimated for the low and high openness to experience group, they can now be analyzed and statistically compared. The third R package in our SemNA pipeline is the SemNeT package. SemNeT handles the computation of network measures and statistical tests. The network measures are used to quantify the structure and properties of the semantic networks, while the statistical tests are used to compare the differences between these measures.

While methods and tools to measure different networks is an active and thriving area (Epskamp, Borsboom, & Fried, 2018; Moreno & Neville, 2013; van Borkulo et al., 2017), a standard network analysis includes visualizing the network, measuring its properties, and statistically examining the properties. These analyses are conducted in the third stage of our SemNA pipeline.

Visualization of Semantic Networks

Visualizations of networks are qualitative; however, they provide intuitive depictions of each network’s structure. There are several R packages (e.g., igraph, qgraph, and networkD3; Allaire, Gandrud, Russell, & Yetman, 2017) and other software (e.g., Cytoscape; Shannon et al., 2003) that are capable of visualizing networks. For our example, we’ll show how to use the compare_nets function in SemNeT to visually compare our two networks.

compare_nets allows the user to visually compare any number of networks. The networks are plotted side-by-side using the R packages qgraph and networktools (Jones, 2019). An important consideration when plotting the networks is the layout. The layout affects how the nodes are positioned with respect to one another on a 2D layout. Some layouts are useful for visualizing the networks only, while other layouts may have more meaningful representations of the distances between nodes (Jones, Mair, & McNally, 2018).

In general, nodes in the network are depicted with circles and their labels denote what each node represents. In our example, each label denotes a unique animal response given by the sample (see Figure 6). The edges in the network are depicted by lines, which may vary in size, representing the magnitude of association, and color (e.g., green and red), representing the sign of the association (e.g., positive or negative, respectively). Many visualization algorithms will use the magnitude of association to depict the distance between nodes in the network.

A common visualization algorithm in the psychological literature is the Fruchterman-Reingold algorithm (as known as a force-directed algorithm; Fruchterman & Reingold, 1991). This algorithm uses the magnitude of associations to map the distances between nodes. To visualize the networks in our example using this algorithm, the following code should be run:

# Visually compare networks
compare_nets(net.low, net.high,
             title = list("Low Openness", "High Openness"),
             config = "spring", weighted = FALSE)

Comparison of low (left) and high (right) openness to experience semantic networks based on the Fruchterman-Reingold algorithm

The compare_nets visualization function depicts the networks side-by-side (Figure 6). If there are more than two networks, then compare_nets will automatically optimize the plot layout for comparison. There is an option to add titles using the title argument, which must be input as a list object. If no titles are input, then the name(s) of the object(s) will be used (e.g., net.low and net.high). The config argument can be used to select the layout. This defaults to "spring", which is the Fruchterman-Reingold layout. Other layouts that are applied by the qgraph package can also be used (see ?qgraph). Finally, the weighted argument can be used to visualize the networks as weighted (TRUE) or unweighted (FALSE). In our example, we specified the networks to be unweighted (weighted = FALSE). Because the networks are unweighted, the lines are black (rather than green and red based on sign) and the thickness of the edges are all the same (rather than size differences based on magnitude of association).

Based on Figure 6, it appears that the high openness to experience group has a more tightly clustered and interconnected network; however, as mentioned before, these visualizations are generally qualitative and not meant for interpretation. Therefore, regardless of whether these depictions appear to have meaningful differences, quantitative analysis is necessary to determine whether there are truly differences.

Global Network Measures

Common SemNA metrics are global (or macroscopic) network measures. These measures focus on the structure of the entire network, emphasizing how the nodes are connected as a cohesive whole (Siew et al., 2019). In contrast, local (or microscopic) network measures focus on the role of individual nodes in the network (Bringmann et al., 2019). Below we briefly define and describe a few global network measures—clustering coefficient, average shortest path length, and modularity—that are commonly used in SemNA. A more extensive review of these and other measures can be found in Siew et al. (2019).

A common network structure that tends to emerge across many different systems (including semantic memory) is a small world structure (Watts & Strogatz, 1998). Small world networks are characterized by having a high clustering coefficient (CC) and moderate average shortest path length (ASPL). The CC refers to the extent that two neighbors of a node will be neighbors themselves, on average, in the network. A network with a higher CC, for example, suggests that nodes that are near-neighbors to each other tend to co-occur and be connected. Wulff, Hills, & Mata (2018) examined younger and older adults semantic networks that were estimated using verbal fluency data and found that the older adults had a smaller CC compared to the younger adults. This structure was interpreted as having a role in the cognitive slowing observed in older adults.

The ASPL refers to the average shortest number of steps (i.e., edges) that is needed to traverse between any pair of nodes in the network. That is, it’s the average of the minimum number of steps from one node to all other nodes. In cognitive models, ASPL may affect the activation of associations between concepts (known as spreading activation; Anderson, 1983; Siew, 2019) such that a lower ASPL would increase the likelihood of reaching a greater number associations. Several studies of creative ability and semantic networks have linked lower ASPL to greater creative ability (Benedek et al., 2017; Kenett et al., 2016a; Kenett & Faust, 2019). These studies have argued that the lower ASPL in the semantic network of people with higher creative ability (compared to less creative people) may have allowed them to reach more remote associations, which in turn could be combined into novel and useful associations (Kenett & Faust, 2019).

Finally, modularity measures how a network compartmentalizes (or partitions) into sub-networks (i.e., smaller networks within the overall network; Fortunato, 2010; Newman, 2006). The modularity statistic (Q) measures the extent to which the network has dense connections between nodes within a sub-network and sparse (or few) connections between nodes in different sub-networks. Larger Q values suggest that these sub-networks are more well-defined than lower Q values, suggesting that the network can more readily be segmented into different parts. A few studies have demonstrated the significance of modularity in cognitive networks (Kenett et al., 2016b; Siew, 2013). For example, Kenett et al. (2016b) found that people with high functioning autism (Asperger Syndrome) had a more modular semantic network relative to matched controls. They suggest that this “hyper” modularity might be related to rigidity of thought that often characterizes people with Asperger Syndrome.

These metrics represent the basic measures of semantic networks. To compute these measures, the following code can be used:

# Compute network measures
semnetmeas(net.low, meas = c("ASPL", "CC", "Q"), weighted = FALSE)
semnetmeas(net.high, meas = c("ASPL", "CC", "Q"), weighted = FALSE)

Group	ASPL	CC	Q
Table 7. Group Network Measures
Low	3.25	0.74	0.64
High	2.78	0.76	0.59

semnetmeas only computes global network measures. Other measures, such as centrality, can be computed using the NetworkToolbox package. To obtain only one measure, the argument meas can be set to the desired measure (e.g., meas = "ASPL" for only ASPL). Moreover, to obtain weighted measures, the argument weighted can be set to TRUE (defaults to FALSE). Based on the global network measures, in our example, it appears that the high openness to experience group has a higher CC and lower ASPL and Q than the low openness to experience group (Table 7).

Statistical Tests

In order to statistically test for differences in these measures across networks, other approaches must be used. Here, we present two approaches in the SemNeT package that statistically test for differences between networks: tests against random networks and bootstrapped partial networks (e.g., Kenett et al., 2013, 2016a).

Tests against random networks

Tests against random networks can be performed to determine whether the network measures observed in the groups are different from what would be expected from a random network with the same number of nodes and edges (Beckage, Smith, & Hills, 2011; Steyvers & Tenenbaum, 2005). This approach iteratively simulates Erds-Rényi random networks with the same number of nodes and edges with a fixed edge probability (Boccaletti, Latora, Moreno, Chavez, & Hwang, 2006). For each simulated random network, global network measures (i.e., CC, ASPL, and Q) are computed, resulting in a sampling distribution of these measures for the random network.

The Erds-Rényi random network model does not make any assumptions regarding the structure of the network, which makes it a useful null model to test against (Erdős & Rényi, 1960). That is, whether the network structure for a specific network measure could be generated from a random network model. The following code implements the tests against random networks:

# Compute tests against random networks
rand.test <- randnet.test(net.low, net.high, iter = 1000, cores = 4)

This function accepts any number of networks as inputs, allowing for multiple networks to be tested. For our example, we only input two networks: net.low and net.high. The other arguments, iter and cores, specify the number of simulated random networks and processing cores that should be used in the computation, respectively. The output from this function is a table that reports the p-values for each network compared to the random network values and the values below “Random” are the mean (M) and standard deviation (SD) of the global network measures for the random network distribution.

Group	Measures	p-values	Random (M)	Random (SD)
Table 8. p-values of Low and High Openness to Experience Networks Against Random Networks
	ASPL	< .001	3.04	0.03
Low	CC	< .001	0.04	0.01
	Q	< .001	0.38	0.01
	ASPL	< .001	3.03	0.03
High	CC	< .001	0.04	0.01
	Q	< .001	0.38	0.01

randnet.test will compute tests against random networks for each network input into the function. In our example, there were two networks, so the result computed random networks based on both network structures. As shown in Table 8, all global network measures were significantly different from random for both openness to experience groups. This suggests that both networks have significantly different structures than a random network with the same number of nodes and edges.

Bootstrapped partial networks

The second test to statistically compare semantic networks applies a bootstrap method (Efron, 1979). This approach selects a subset of nodes in the network (e.g., 50%), estimates all compared networks for this subset of nodes, and then computes the network measures (Kenett, Anaki, & Faust, 2014). This method is known as without replacement because each node can only be selected once (Bertail, 1997; Politis & Romano, 1994; Shao, 2003). With these nodes removed from the data, partial networks are estimated from each group’s data. The network measures are then computed for these partial networks. This process repeats iteratively, usually 1,000 times.

Thus, this process estimates and compares partial networks from the full networks. The rationale for this approach is twofold: (1) if the full networks differ from each other, then any partial network consisting of the same nodes should also be different, and thus (2) the generation of many partial networks allows for a direct statistical comparison between the full networks (Kenett et al., 2016b).

Similar to the test against random networks, these iterated partial networks form a sampling distribution of the global network measures for both groups, but solely based on the empirical data. These sampling distributions can then be statistically compared using a t-test (or ANOVA) to determine whether the global network measures are different between the compared networks. In a recent implementation, we applied this approach by retaining a gradation of nodes (50%, 60%, 70%, 80%, and 90%; Christensen, Kenett, Cotter, Beaty, & Silvia, 2018). This implementation allows for trends in the distributions to be revealed. To perform this analysis, the function partboot can be used.

#Arguments for 'partboot' function
bootSemNeT(..., method = c("CN", "NRW", "PF", "TMFG"),
         type = c("case", "node"), prop, sim,
         weighted = FALSE, iter = 1000, cores)

The first argument, ..., is for the input of the equated group binary response matrices (e.g., equate.low and equate.high). The ... argument is used across all bootSemNeT associated functions to allow for the plotting and statistical testing of any number of groups involved in the analysis. percent refers to the number of nodes that should remain in the network (e.g., prop = .50). The next argument, sim, selects the similarity measure (defaults to "cosine"). Next, the weighted argument decides whether weighted or unweighted network measures should be computed (defaults to FALSE or unweighted measures). iter is the number of bootstrap iterations to be completed (defaults to 1000). Finally, cores refers to the number of processor cores to use in the parallelization of the analysis (defaults to the computer’s maximum number of cores minus one). Below is code to run the analysis for retaining 50%, 60%, 70%, 80%, and 90% of nodes from the groups, respectively:

# Compute partial bootstrap network analysis
## Set seed for reproducibility
set.seed(42)

## 50% of nodes remaining in network
boot.fifty <- partboot(equate.low, equate.high,
                       method = "TMFG", type = "node",
                       prop = .50, iter = 1000,
                       sim = "cosine", cores = 4)
## 60% of nodes remaining in network
boot.sixty <- partboot(equate.low, equate.high,
                       method = "TMFG", type = "node",
                       prop = .60, iter = 1000,
                       sim = "cosine", cores = 4)
## 70% of nodes remaining in network
boot.seventy <- partboot(equate.low, equate.high,
                         method = "TMFG", type = "node",
                         prop = .70,, iter = 1000,
                         sim = "cosine", cores = 4)
## 80% of nodes remaining in network
boot.eighty <- partboot(equate.low, equate.high,
                        method = "TMFG", type = "node",
                        prop = .80, iter = 1000,
                        sim = "cosine", cores = 4)
## 90% of nodes remaining in network
boot.ninety <- partboot(equate.low, equate.high,
                        method = "TMFG", type = "node",
                        prop = .90, iter = 1000,
                        sim = "cosine", cores = 4)

For more than two groups, the additional equated binary responses matrices can be added as input. partboot will perform the same analyses—selecting the same subset of nodes—for all binary response matrices that are input. The computation time for each implementation increases with the number of nodes that remain in the network. Overall, the computation is relatively fast, with computing times for each implementation ranging from 5 to 10 minutes (around 40 minutes in total; defined for a computer with the specifications: Intel Core i5-4300M 2.60GHz and 16GB DDR3 RAM). Once the partial bootstrap network measures are computed, they can then be visualized using the plot function:

# Plot bootstrap results
plots <- plot(boot.fifty, boot.sixty, boot.seventy,
              boot.eighty, boot.ninety, groups = c("Low","High"),
              measures = c("ASPL", "CC", "Q"))

The function accepts as many bootstrap samples as the user inputs. There are two arguments that the user can define: groups and measures. The groups argument specifies the names of the groups for the plot (must be a character vector). If no group names are entered, then the default is to name the groups after the name of the objects used for the input in partboot (i.e., equate.low and equate.high). The measures argument allows the user to choose which global network measures are plotted (defaults to all measures). Below are the results of our example from the data collected by Christensen et al. (2018):

Plots of the boostrapped partial network measures (1000 samples per percentage of nodes remaining. Density plots are above the box plots and scatterplots (individual dots depict a single sample). The black dot in the scatterplots represents the mean for the respective group and percentage.

In Figure 7, we can see that as the number of nodes that are retained in the network increase (i.e., from 50% to 90%), the sampling distribution of each of the measures becomes more different. This suggests that the two groups have different have global network measures and thereby network structures. To demonstrate these results quantitatively, statistical tests can be applied using the following code:

# Perform t-tests on bootstrap results
tests <- partboot.test(boot.fifty, boot.sixty, boot.seventy,
                       boot.eighty, boot.ninety)

When there are two groups (like our example), then partboot.test will output t-test results in a list containing objects corresponding to the network measure (ASPL, tests$ASPL; CC, tests$CC; Q, tests$Q). If there are more than 2 groups used as input in partboot, then ANOVAs (tests$ANOVA) and Tukey’s HSD pairwise comparison (tests$HSD) will be output instead. Within each of these objects, there is a matrix of t-test (or ANOVA) results with each of the rows corresponding to the percentage of nodes remaining in the partial networks. These results include the t-statistic (or F-statistic), degrees of freedom(s), p-value, Cohen’s d (or $\eta_p^2$; Cohen, 1992), lower and upper bound of the 95% confidence interval, and the direction of the effect. Below is an organized table of these results for our example:

	df	t	d	t	d	t	d
Table 9. Partial bootstrapped networks results
90%	1998	-82.59	3.69	50.73	2.27	-66.76	2.99
80%	1998	-54.64	2.44	39.75	1.78	-46.93	2.10
70%	1998	-34.86	1.56	32.24	1.44	-33.61	1.50
60%	1998	-23.31	1.04	22.50	1.01	-24.58	1.10
50%	1998	-17.19	0.77	18.05	0.81	-18.11	0.81
Note: 1000 samples were generated for each percentage of nodes remaining. t-statistics and Cohen’s d values are presented (Cohen, 1992). Negative t-statistics denote the high openness to experience group having lower values than the low openness to experience group. All p’s < 0.001. Cohen’s d effect sizes: 0.50, moderate; 0.80, large; 1.10, very large. ASPL, average shortest path length; CC, clustering coefficient; Q, modularity.

Table 9 shows that, across the percentage of nodes included, all network measures are significantly different between the two groups such that the high openness to experience group has a larger CC and smaller ASPL and Q values. Importantly, these effects increase (from moderately large to very large effect sizes) as the number of nodes remaining in the network increase, suggesting that the differences between the full networks are robust.

References

Allaire, J. J., Gandrud, C., Russell, K., & Yetman, C. J. (2017). networkD3: D3 javascript network graphs from r. Retrieved from https://CRAN.R-project.org/package=networkD3

Anderson, J. R. (1983). A spreading activation theory of memory. Journal of Verbal Learning and Verbal Behavior, 22, 261–295. https://doi.org/10.1016/S0022-5371(83)90201-3

Beckage, N., Smith, L., & Hills, T. (2011). Small worlds and semantic network growth in typical and late talkers. PloS ONE, 6, e19348. https://doi.org/10.1371/journal.pone.0019348

Benedek, M., Kenett, Y. N., Umdasch, K., Anaki, D., Faust, M., & Neubauer, A. C. (2017). How semantic memory structure and intelligence contribute to creative thought: A network science approach. Thinking & Reasoning, 23, 158–183. https://doi.org/10.1080/13546783.2016.1278034

Bertail, P. (1997). Second-order properties of an extrapolated bootstrap without replacement under weak assumptions. Bernoulli, 3, 149–179.

Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., & Hwang, D.-U. (2006). Complex networks: Structure and dynamics. Physics Reports, 424, 175–308. https://doi.org/10.1016/j.physrep.2005.10.009

Bringmann, L. F., Elmer, T., Epskamp, S., Krause, R. W., Schoch, D., Wichers, M., … Snippe, E. (2019). What do centrality measures measure in psychology networks? Journal of Abnormal Psychology.

Christensen, A. P., & Kenett, Y. N. (under review). Semantic network analysis (SemNA): A tutorial on preprocessing, estimating, and analyzing semantic networks. PsyArXiv. https://doi.org/10.31234/osf.io/eht87

Christensen, A. P., Kenett, Y. N., Cotter, K. N., Beaty, R. E., & Silvia, P. J. (2018). Remotely close associations: Openness to experience and semantic memory structure. European Journal of Personality, 32, 480–492. https://doi.org/10.1002/per.2157

Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159.

Efron, B. (1979). Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 1–26.

Epskamp, S., Borsboom, D., & Fried, E. I. (2018). Estimating psychological networks and their accuracy: A tutorial paper. Behavior Research Methods, 50, 195–212. https://doi.org/10.3758/s13428-017-0862-1

Erdős, P., & Rényi, A. (1960). On the evolution of random graphs. Publication of the Mathematical Institute of the Hungarian Academy of Sciences, 5, 17–60.

Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486, 75–174. https://doi.org/10.1016/j.physrep.2009.11.002

Fruchterman, T. M., & Reingold, E. M. (1991). Graph drawing by force-directed placement. Software: Practice and Experience, 21, 1129–1164. https://doi.org/10.1002/spe.4380211102

Jones, P. (2019). networktools: Tools for identifying important nodes in networks. Retrieved from https://CRAN.R-project.org/package=networktools

Jones, P. J., Mair, P., & McNally, R. (2018). Visualizing psychological networks: A tutorial in R. Frontiers in Psychology, 9. https://doi.org/10.3389/fpsyg.2018.01742

Kenett, Y. N., Anaki, D., & Faust, M. (2014). Investigating the structure of semantic networks in low and high creative persons. Frontiers in Human Neuroscience, 8, 407. https://doi.org/10.3389/fnhum.2014.00407

Kenett, Y. N., Beaty, R. E., Silvia, P. J., Anaki, D., & Faust, M. (2016a). Structure and flexibility: Investigating the relation between the structure of the mental lexicon, fluid intelligence, and creative achievement. Psychology of Aesthetics, Creativity, and the Arts, 10, 377–388. https://doi.org/10.1037/aca0000056

Kenett, Y. N., & Faust, M. (2019). A semantic network cartography of the creative mind. Trends in Cognitive Sciences, 23, 271–274. https://doi.org/10.1016/j.tics.2019.01.007

Kenett, Y. N., Gold, R., & Faust, M. (2016b). The hyper-modular associative mind: A computational analysis of associative responses of persons with asperger syndrome. Language and Speech, 59, 297–317. https://doi.org/10.1177/0023830915589397

Kenett, Y. N., Wechsler-Kashi, D., Kenett, D. Y., Schwartz, R. G., Ben Jacob, E., & Faust, M. (2013). Semantic organization in children with cochlear implants: Computational analysis of verbal fluency. Frontiers in Psychology, 4. https://doi.org/10.3389/fpsyg.2013.00543

Moreno, S., & Neville, J. (2013). Network hypothesis testing using mixed Kronecker product graph models. Proceedings of the 13th ieee international conference on data mining, 1163–1168. Retrieved from https://ieeexplore.ieee.org/abstract/document/6729615

Newman, M. E. J. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103, 8577–8582. https://doi.org/10.1073/pnas.0601602103

Politis, D. N., & Romano, J. P. (1994). Large sample confidence regions based on subsamples under minimal assumptions. The Annals of Statistics, 2031–2050.

Shannon, P., Markiel, A., Ozier, O., Baliga, N. S., Wang, J. T., Ramage, D., … Ideker, T. (2003). Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research, 13, 2498–2504. https://doi.org/10.1101/gr.1239303

Shao, J. (2003). Impact of the bootstrap on sample surveys. Statistical Science, 18, 191–198.

Siew, C. S. Q. (2013). Community structure in the phonological network. Frontiers in Psychology, 4, 553. https://doi.org/10.3389/fpsyg.2013.00553

Siew, C. S. Q. (2019). spreadr: An R package to simulate spreading activation in a network. Behavior Research Methods, 51, 910–929. https://doi.org/10.3758/s13428-018-1186-5

Siew, C. S. Q., Wulff, D. U., Beckage, N. M., & Kenett, Y. N. (2019). Cognitive network science: A review of research on cognition through the lens of network representations, processes, and dynamics. Complexity, 2019. https://doi.org/10.1155/2019/2108423

Steyvers, M., & Tenenbaum, J. B. (2005). The large-scale structure of semantic networks: Statistical analyses and a model of semantic growth. Cognitive Science, 29, 41–78. https://doi.org/10.1207/s15516709cog2901_3

van Borkulo, C. D., Boschloo, L., Kossakowski, J. J., Tio, P., Schoevers, R. A., Borsboom, D., & Waldorp, L. J. (2017). Comparing network structures on three aspects: A permutation test.

Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of “small-world” networks. Nature, 393, 440–442. https://doi.org/10.1038/30918

Wulff, D. U., Hills, T., & Mata, R. (2018). Structural differences in the semantic networks of younger and older adults. https://doi.org/10.31234/osf.io/s73dp

Analyzing Semantic Networks

Alexander Christensen

10/28/2019