Introduction

SEMinR brings a friendly syntax to creating and estimating structural equation models (SEM). The syntax allows applied practitioners of SEM to use terminology that is very close to their familiar modeling terms (e.g., reflective, composite, interactions) instead of specifying underlying matrices and covariances. SEM models can be estimated either using Partial Least Squares Path Modeling (PLS-PM) as popularized by SmartPLS, or using Covariance Based Structural Equation Modeling (CBSEM) as popularized by LISREL and AMOS. Confirmatory Factor Analysis (CFA) of reflective measurements models is also supported. Both CBSEM and CFA estimation use the Lavaan package.

A natural feeling, domain-specific language to build and estimate structural equation models in R
Can use variance-based PLS estimation and covariance-based SEM estimation to model composite and common-factor constructs
High-level functions to quickly specify interactions, higher order constructs, and structural paths

SEMinR uses its own PLS-PM estimation engine and integrates with the Lavaan package for CBSEM/CFA estimation. It also brings a few methodological advancements not found in other packages or software, and encourages best practices wherever possible.

PLS-PM advances and best-practices in SEMinR:

Implements PLS path modeling algorithm (Wold, 1985)
Automatically adjusts PLS estimates to ensure consistency (PLSc) wherever common-factors are involved (Dijkstra & Henseler, 2015)
Adjusts for known biases in interaction terms in PLS models (Henseler & Fassot, 2006)
Continuously tested against leading popular PLS-PM software to ensure parity of outcomes: SmartPLS (Ringle et al., 2015) and ADANCO (Henseler and Dijkstra, 2015), semPLS (Monecke and Leisch, 2012) and matrixpls (Rönkkö, 2016)
Incorporates high performance, multi-core bootstrapping function (Hair et al., 2017)

CBSEM/CFA advances and best-practices in SEMinR:

Implements covariance-based structural equation modeling (Joreskog, 1973)
Extracts ten Berge factor scores that have the same correlation pattern as the latent constructs (ten Berge et al. 1999; Logan et al. 2019)
Creates product-indicator interactions, or two-stage interactions using ten Berge scores from a CFA (Lodder et al, 2019)
Defaults to robust maximum-likelihood (MLR) estimation to account for potential non-normality (Yuan et al. 2000)

Briefly, there are three steps to specifying and estimating a structural equation model using SEMinR. The following example is generic to either PLS-PM or CBSEM/CFA.

Describe measurement model for each construct and its items, specifying interaction terms and other measurement features:

# Distinguish and mix composite measurement (used in PLS-PM)
# or reflective (common-factor) measurement (used in CBSEM, CFA, and PLSc)
# - We will first use composites in PLS-PM analysis
# - Later we will convert the omposites into reflectives for CFA/CBSEM (step 3)
measurements <- constructs(
  composite("Image",        multi_items("IMAG", 1:5)),
  composite("Expectation",  multi_items("CUEX", 1:3)),
  composite("Value",        multi_items("PERV", 1:2)),
  composite("Satisfaction", multi_items("CUSA", 1:3)),
  interaction_term(iv = "Image", moderator = "Expectation")
)

Describe the structural model of causal relationships between constructs (and interaction terms):

# Quickly create multiple paths "from" and "to" sets of constructs  
structure <- relationships(
  paths(from = c("Image", "Expectation", "Image*Expectation"), to = "Value"),
  paths(from = "Value", to = "Satisfaction")
)

Put the above elements together to estimate the model using PLS-PM, CBSEM, or a CFA:

# Estimate using PLS-PM from model parts defined earlier  
pls_model <- estimate_pls(data = mobi, 
                          measurement_model = measurements, 
                          structural_model = structure)
summary(pls_model)

# note: PLS requires seperate bootstrapping for PLS path estimates
# SEMinR uses multi-core parallel processing to speed up bootstrapping
boot_estimates <- bootstrap_model(pls_model, nboot = 1000, cores = 2)
summary(boot_estimates)

# Alternatively, we could estimate our model using CBSEM, which uses the Lavaan package
# We often wish to conduct a CFA of our measurement model prior to CBSEM
# note: we must convert composites in our measurement model into reflective constructs for CFA/CBSEM
cfa_model <- estimate_cfa(data = mobi, as.reflective(measurements))
summary(cfa_model)

cbsem_model <- estimate_cbsem(data = mobi, as.reflective(measurements), structure)
summary(cbsem_model)

# note: the Lavaan syntax and Lavaan fitted model can be extracted for your own specific needs
cbsem_model$lavaan_syntax
cbsem_model$lavaan_model

SEMinR seeks to combine ease-of-use, flexible model construction, and high-performance. Below, we will cover the details and options of each of the three parts of model construction and estimation demonstrated above.

Setup

You must install the SEMinR library once on your local machine:

install.packages("seminr")

And then load it in every session you want to use it:

library(seminr)

Data

You must load your data into a dataframe from any source you wish (CSV, etc.). Column names must be names of your measurement items.

Important: Avoid using asterixes ’*’ in your column names (these are reserved for interaction terms).

survey_data <- read.csv("mobi_survey_data.csv")

For demonstration purposes, we will start with a dataset bundled with the seminr package - the mobi data frame (also found in the semPLS R package). This dataset comes from a measurement instrument for the European Customer Satisfaction Index (ECSI) adapted to the mobile phone market (Tenenhaus et al. 2005).

You can see a description and sample of what is in mobi:

dim(mobi)
#> [1] 250  24
head(mobi)
#>   CUEX1 CUEX2 CUEX3 CUSA1 CUSA2 CUSA3 CUSCO CUSL1 CUSL2 CUSL3 IMAG1 IMAG2 IMAG3
#> 1     7     7     6     6     4     7     7     6     5     6     7     5     5
#> 2    10    10     9    10    10     8    10    10     2    10    10     9    10
#> 3     7     7     7     8     7     7     6     6     2     7     8     7     6
#> 4     7    10     5    10    10    10     5    10     4    10    10    10     5
#> 5     8     7    10    10     8     8     5    10     3     8    10    10     5
#> 6    10     9     7     8     7     7     8    10     3    10     8     9    10
#>   IMAG4 IMAG5 PERQ1 PERQ2 PERQ3 PERQ4 PERQ5 PERQ6 PERQ7 PERV1 PERV2
#> 1     5     4     7     6     4     7     6     5     5     2     3
#> 2    10     9    10     9    10    10     9    10    10    10    10
#> 3     4     7     7     8     5     7     8     7     7     7     7
#> 4     5    10     8    10    10     8     4     5     8     5     5
#> 5     8     9    10     9     8    10     9     9     8     6     6
#> 6     8     9     9    10     9    10     8     9     9    10    10

Measurement model description

SEMinR uses the following functions to describe measurement models:

constructs() gathers all the construct measurement models
composite() or reflective() define the measurement mode of individual constructs
interaction_term() specifies interactions and higher_composite() specifies higher order constructs
multi_items() or single_item() define the items of a construct

These functions should be natural to SEM practitioners and encourages them to explicitly specify their core nature of their measurement models: composite or common-factor (See Sarstedt et al., 2016, and Henseler et al., 2013, for clear definitions).

Let’s take a closer look at the individual functions.

Specifying measurement models with constructs

constructs() compiles the measurement model specification list from the user specified construct descriptions described in the parameters. You must supply it with any number of individual composite, reflective, interaction_term, or higher_composite constructs. Note that we currenly only support higher-order constructs for PLS-PM estimation (i.e., composites).

measurements <- constructs(
  composite("Image",         multi_items("IMAG", 1:5), weights = mode_B),
  composite("Expectation",   multi_items("CUEX", 1:3), weights = regression_weights),
  composite("Quality",       multi_items("PERQ", 1:7), weights = mode_A),
  composite("Value",         multi_items("PERV", 1:2), weights = correlation_weights),
  reflective("Satisfaction", multi_items("CUSA", 1:3)),
  reflective("Complaints",   single_item("CUSCO")),
  higher_composite("HOC", c("Value", "Satisfaction"), orthogonal, mode_A),
  interaction_term(iv = "Image", moderator = "Expectation", method =  orthogonal, weights = mode_A),
  reflective("Loyalty",      multi_items("CUSL", 1:3))
)

We are storing the measurement model in the measurements object for later use.

Note that neither a dataset nor a structural model is specified in the measurement model stage, so we can reuse the measurement model object measurements across different datasets and structural models.

Describe individual constructs as composite or reflective

composite() or reflective() describe the measurement of a construct by its items.

For example, we can use composite() for PLS models to describe mode A (correlation weights) for the “Expectation” construct with manifest variables CUEX1, CUEX2, and CUEX3:

composite("Expectation", multi_items("CUEX", 1:3), weights = mode_A)
# is equivalent to:
composite("Expectation", multi_items("CUEX", 1:3), weights = correlation_weights)

We can describe composite “Image” using mode B (regression weights) with manifest variables IMAG1, IMAG2, IMAG3, IMAG4 and IMAG5:

composite("Image", multi_items("IMAG", 1:5), weights = mode_B)
# is equivalent to:
composite("Image", multi_items("IMAG", 1:5), weights = regression_weights)

Alternatively, we can use reflective() for CBSEM/CFA/PLSc to describe the reflective, common-factor measurement of the “Satisfaction” construct with manifest variables CUSA1, CUSA2, and CUSA3:

reflective("Satisfaction", multi_items("CUSA", 1:3))

Converting composite models into reflective models

For covariance-based SEM and CFA, you will want constructs to be reflective common factors. If you already have composite constructs or measurement models, you may use them for CBSEM/CFA after converting them to reflective versions. The as.reflective() function can convert either a single construct or an entire measurement model into reflective forms.

# Coerce a composite into reflective form
img_composite <- composite("Image", multi_items("IMAG", 1:5))
img_reflective <- as.reflective(img_composite)

# Coerce all constructs of a measurement model into composite form
mobi_composites <- constructs(
  composite("Image",         multi_items("IMAG", 1:5)),
  composite("Expectation",   multi_items("CUEX", 1:3)),
  reflective("Complaints",   single_item("CUSCO"))
)
mobi_reflective <- as.reflective(mobi_composites)

Specifying construct measurement items

SEMinR strives to make specification of measurement items shorter and cleaner using multi_items() or single_item()

multi_items() creates a vector of multiple measurement items with similar names
single_item() describe a single measurement item

We can describe the manifest variables: IMAG1, IMAG2, IMAG3, IMAG4 and IMAG5:

multi_items("IMAG", 1:5)
# which is equivalent to the R vector:
c("IMAG1", "IMAG2", "IMAG3", "IMAG4", "IMAG5")

If your constructs are not numbered perfectly sequentially, then you will combine your items using the c() function:

multi_items("IMAG", c(1, 3:5))
# which is equivalent to the R vector:
c("IMAG1", "IMAG3", "IMAG4", "IMAG5")

multi_items() is used in conjunction with composite() or reflective() to describe a composite and common-factor construct respectively.

We can describe a single manifest variable CUSCO:

single_item("CUSCO")
# which is equivalent to the R character string:
"CUSCO"

Note that single-item constructs can be defined as either composite mode A or reflective common-factor, but single-item constructs are essentially composites whose construct scores are determined.

Item associations (CBSEM only)

Covariance-based SEM models generally constrain all item errors to be unrelated. However, researchers might sometimes wish to free up covariances between item errors for estimation.

# The following specifies that items PERQ1 and PERQ2 covary with each other, both covary with IMAG1
mobi_am <- associations(
  item_errors("PERQ1", "PERQ2"),
  item_errors(c("PERQ1", "PERQ2"), "IMAG1")
)

Interaction terms

Creating interaction terms by hand can be a time-consuming and error-prone. SEMinR provides high-level functions for simply creating interactions between constructs.

Interaction terms are described in the measurement model function constructs() using the following methods:

product_indicator describes a single interaction composite as generated by the scaled product-indicator method as described by Henseler and Chin (2010).
two_stage describes a single-item interaction composite that uses a product of the IV and moderator construct scores. For PLS-PM, the first stage uses PLS-PM described by Henseler and Chin (2010) whereas for CBSEM, the first stage uses a CFA and extracts ten Berge factor scores.
orthogonal describes a single interaction composite generated by the orthogonalization method of Henseler and Chin (2010). It is more typical to use for composites, to help interpret multicollinearity between product terms

For these methods the standard deviation of the interaction term is adjusted as noted below.

For example, we can describe the following interactions between Image and Expectation constructs:

# By default, interaction terms are computed using two stage procedures
interaction_term(iv = "Image", moderator = "Expectation")

# You can also explicitly specify how to create the interaction term
interaction_term(iv = "Image", moderator = "Expectation", method =  two_stage)
interaction_term(iv = "Image", moderator = "Expectation", method =  product_indicator)
interaction_term(iv = "Image", moderator = "Expectation", method =  orthogonal)

Note that these functions themselves return functions (closures) that are not resolved until processed in the estimate_pls() or estimate_cbsem() functions for SEM estimation. Note that recent studies show PLS models must adjust the standard deviation of the interaction term because: “In general, the product of two standardized variables does not equal the standardized product of these variables” (Henseler and Chin 2010). SEMinR automatically adjusts for this providing highly accurate model estimations.

Important Note: SEMinR syntax uses an asterisk “*” as a naming convention for the interaction construct. Thus, the “Image” + “Expectation” interaction is called “Image*Expectation” in the structural model below. Please refrain from using an asterisk "*" in the naming of non-interaction constructs.

Structural model description

SEMinR makes for human-readable and explicit structural model specification using these functions:

relationships() gather all the structural relationships between all constructs
paths() specifies relationships between sets of antecedents and outcomes

Specify structural model of relationships between constructs

relationships() compiles the structural model source-target list from the user specified structural path descriptions described in the parameters.

For example, we can describe a structural model for the mobi data:

mobi_sm <- relationships(
  paths(from = "Image",        to = c("Expectation", "Satisfaction", "Loyalty")),
  paths(from = "Expectation",  to = c("Quality", "Value", "Satisfaction")),
  paths(from = "Quality",      to = c("Value", "Satisfaction")),
  paths(from = "Value",        to = c("Satisfaction")),
  paths(from = "Satisfaction", to = c("Complaints", "Loyalty")),
  paths(from = "Complaints",   to = "Loyalty")
)

Note that neither a dataset nor a measurement model is specified in the structural model stage, so we can reuse the structural model object mobi_sm across different datasets and measurement models.

Specify structural paths with

paths() describe single or multiple structural paths between sets of constructs.

For example, we can define paths from a single antecedent construct to a single outcome construct:

# "Image" -> "Expectation"
paths(from = "Image", to = "Expectation")

Or paths from a single antecedent to multiple outcomes:

# "Image" -> "Expectation"
# "Image" -> "Satisfaction"
paths(from = "Image", to = c("Expectation", "Satisfaction"))

Or paths from multiple antecedents to a single outcome:

# "Image" -> "Satisfaction"
# "Expectation" -> "Satisfaction"
paths(from = c("Image", "Expectation"), to = "Satisfaction")

Or paths from multiple antecedents to a common set of outcomes:

# "Expectation" -> "Value"
# "Expectation" -> "Satisfaction"
# "Quality" -> "Value"
# "Quality" -> "Satisfaction"
paths(from = c("Expectation", "Quality"), to = c("Value", "Satisfaction"))

Even the most complicated structural models become quick and easy to specify and modify.

Model Estimation

SEMinR can estimate a CFA or a full SEM model described by the measurement and structural models above:

estimate_pls() estimates the parameters of a PLS-SEM model
estimate_cfa() estimates the parameters of a CFA model using the Lavaan package
estimate_cbsem() estimates the parameters of a CBSEM model using the Lavaan package

The above functions take some combination of the following parameters:

data: the dataset containing the measurement model items specified in constructs()
measurement_model: the measurement model described by the constructs() function
structural_model (PLS-PM and CBSEM only): the structural model described by the paths() function
inner_weights (PLS-PM only): the weighting scheme for path estimation - either path_weighting for path weighting (default) or path_factorial for factor weighting (Lohmöller 1989).

For example, we can estimate a simple SEM model adapted from the structural and measurement model with interactions described thus far:

# define measurement model
mobi_mm <- constructs(
  composite("Image",        multi_items("IMAG", 1:5)),
  composite("Expectation",  multi_items("CUEX", 1:3)),
  composite("Value",        multi_items("PERV", 1:2)),
  composite("Satisfaction", multi_items("CUSA", 1:3)),
  interaction_term(iv = "Image", moderator = "Expectation"),
  interaction_term(iv = "Image", moderator = "Value")
)

# define structural model
# note: interactions cobnstruct should be named by its main constructs joined by a '*'
mobi_sm <- relationships(
  paths(to = "Satisfaction",
        from = c("Image", "Expectation", "Value",
                 "Image*Expectation", "Image*Value"))
)

mobi_pls <- estimate_pls(
  data = mobi,
  measurement_model = mobi_mm,
  structural_model = mobi_sm,
  inner_weights = path_weighting
)
#> Generating the seminr model
#> All 250 observations are valid.

mobi_cfa <- estimate_cfa(
  data = mobi,
  measurement_model = as.reflective(mobi_mm)
)
#> Generating the seminr model for CFA

mobi_cbsem <- estimate_cbsem(
  data = mobi,
  measurement_model = as.reflective(mobi_mm),
  structural_model = mobi_sm
)
#> Generating the seminr model for CBSEM

Consistent PLS (PLSc) estimation for common factors

Dijkstra and Henseler (2015) offer an adjustment to generate consistent weight and path estimates of common factors estimated using PLS-PM. When estimating PLS-PM models using estimate_pls(), SEMinR automatically adjusts to produce consistent estimates of coefficients for common-factors defined using reflective().

Note: SEMinR also uses PLSc on PLS models with interactions involving reflective constructs. PLS models with interactions can be estimated as PLS consistent, but are subject to some bias as per Becker et al. (2018). It is not uncommon for bootstrapping PLSc models to result in errors due the calculation of the adjustment.

Bootstrapping PLS models for significance

SEMinR can conduct high performance bootstrapping.

bootstrap_model() bootstraps a SEMinR model previously estimated using estimate_pls()

This function takes the following parameters:

seminr_model: a SEM model provided by estimate_pls()
nboot: the number of bootstrap subsamples to generate
cores: If your pc supports multi-core processing, the number of cores to utilize for parallel processing (default is NULL, wherein SEMinR will automatically detect and utilize all available cores)

For example, we can bootstrap the model described above:

# use 1000 bootstraps and utilize 2 parallel cores
boot_mobi_pls <- bootstrap_model(seminr_model = mobi_pls,
                                 nboot = 1000,
                                 cores = 2)
#> Bootstrapping model using seminr...
#> SEMinR Model successfully bootstrapped

bootstrap_model() returns an object of class boot_seminr_model which contains the following accessible objects:
- boot_seminr_model$boot_paths an array of the nboot estimated bootstrap sample path coefficient matrices
- boot_seminr_model$boot_loadings an array of the nboot estimated bootstrap sample item loadings matrices
- boot_seminr_model$boot_weights an array of the nboot estimated bootstrap sample item weights matrices
- boot_seminr_model$boot_HTMT an array of the nboot estimated bootstrap sample model HTMT matrices
- boot_seminr_model$paths_descriptives a matrix of the bootstrap path coefficients and standard deviations
- boot_seminr_model$loadings_descriptives a matrix of the bootstrap item loadings and standard deviations
- boot_seminr_model$weights_descriptives a matrix of the bootstrap item weights and standard deviations
- boot_seminr_model$HTMT_descriptives a matrix of the bootstrap model HTMT and standard deviations

Notably, bootstrapping can also be meaningfully applied to models containing interaction terms and readjusts the interaction term (Henseler and Chin 2010) for every sub-sample. This leads to slightly increased processing times, but provides accurate estimations.

Reporting the model estimation results

Reporting the estimated model

There are multiple ways of reporting the estimated model. The estimate_pls() function returns an object of class seminr_model. This can be passed directly to the base R function summary(). This can be used in two primary ways:

summary(seminr_model) to report $R^{2}$, adjusted $R^{2}$, path coefficients for the structural model, and the construct reliability metrics $rho_{C}$, also known as composite reliability (Dillon and Goldstein 1987), AVE (Fornell and Larcker 1981), and $rho_{A}$ (Dijkstra and Henseler 2015).

summary(mobi_pls)
#> 
#> Results from  package seminr (1.1.0)
#> 
#> Path Coefficients:
#>                   Satisfaction
#> R^2                      0.614
#> AdjR^2                   0.606
#> Image                    0.470
#> Expectation              0.132
#> Value                    0.320
#> Image*Expectation       -0.140
#> Image*Value              0.023
#> 
#> Reliability:
#>                    rhoC   AVE rhoA
#> Image             0.818 0.478    1
#> Expectation       0.733 0.481    1
#> Value             0.918 0.848    1
#> Image*Expectation 0.833 0.291    1
#> Image*Value       0.918 0.574    1
#> Satisfaction      0.871 0.693    1

model_summary <- summary(seminr_model) returns an object of class summary.seminr_model which contains the following accessible objects (might vary depending on CBSEM or PLS model):
- model_summary$descriptives reports the descriptive statistics and correlations for both items and constructs
- model_summary$paths reports the matrix of path coefficients, $R^{2}$, and adjusted $R^{2}$
- model_summary$reliability reports composite reliability ($rho_{C}$), average variance extracted (AVE), and $rho_{A}$
- model_summary$loadings reports the estimated loadings of the measurement model
- model_summary$weights reports the estimated weights of the measurement model
- model_summary$construct_scores reports the construct scores of composites
- model_summary$vif_items reports the Variance Inflation Factor (VIF) for the measurement model
- model_summary$vif_antecedents report the Variance Inflation Factor (VIF) for the structural model
- model_summary$fSquare reports the effect sizes ($f^{2}$) for the structural model
- model_summary$htmt reports the HTMT for the structural model
- model_summary$iterations (PLS only) reports the number of iterations to converge on a stable model
- model_summary$cross_loadings (PLS only) reports all possible loadings between contructs and items

Please note that common-factor scores are indeterminable and therefore construct scores for reflecive common factors are extracted using a ten Berge procedure.

Reporting results of a bootstrapped PLS

As with the estimated model, there are multiple ways of reporting the bootstrapping of a PLS model. The bootstrap_model() function returns an object of class boot_seminr_model. This can be passed directly to the base R function summary(). This can be used in two primary ways:

summary(boot_seminr_model) to report t-values and p-values for the structural paths

Get information about bootstrapped PLS models using the summary() function on the bootstrapped model object.

summary(boot_mobi_pls)

boot_model_summary <- summary(boot_seminr_model) returns an object of class summary.boot_seminr_model which contains the following accessible objects:
- boot_model_summary$nboot reports the number of bootstraps performed
- model_summary$bootstrapped_paths reports a matrix of direct paths and their standard deviation, t_values, and confidence intervals.
- model_summary$bootstrapped_weights reports a matrix of measurement model weights and their standard deviation, t_values, and confidence intervals.
- model_summary$bootstrapped_loadings reports a matrix of measurement model loadings and their standard deviation, t_values, and confidence intervals.
- model_summary$bootstrapped_HTMT reports a matrix of HTMT values and their standard deviation, t_values, and confidence intervals.

Reporting confidence intervals for direct and mediated bootstrapped structural paths

The summary(boot_seminr_model) function will return t_values and confidence intervals for direct structural paths in PLS models. However, the confidence_interval() function can be used to evaluate the confidence intervals for specific paths - direct and mediated (Zhao et al., 2010) - in a boot_seminr_model object returned by the bootstrap_model() function.