Here we will walk through how to use the bayesdfa package to fit dynamic factor analysis (DFA) models with covariates.
Let’s load the necessary packages:
Covariates in dynamic factor analysis are generally included in the observation model, rather than the process model. Without covariates, the model can be expressed as
\[{x}_{t}={x}_{t-1}+{e}_t\\ { e }_{ t }\sim MVN(0,\textbf{Q})\\ { y }_{ t }=\textbf{Z}{ x }_{ t }+{ v }_{ t }\\ { v }_{ t }\sim MVN(0,\textbf{R})\]
where the matrix \(\textbf{Z}\) is dimensioned as the number of time series by number of trends, and maps the observed data \(y_{t}\) to the latent trends \(x_{t}\).
Observation covariates can be \[{x}_{t}={x}_{t-1}+{e}_t\\ { e }_{ t }\sim MVN(0,\textbf{Q})\\ { y }_{ t }=\textbf{Z}{ x }_{ t }+\textbf{D}{ d }_{ t }+{ v }_{ t }\\ { v }_{ t }\sim MVN(0,\textbf{R})\] where the matrix \(\textbf{D}\) represents time series by number of covariates at time \(t\). For a single covariate, such as temperature, this would mean estimating \(P\) parameters, where \(P\) is the number of time series. For a model including 2 covariates, the number of estimated coefficients would be \(2P\) and so forth.
Process covariates on the trends are less common but can be written as \[{x}_{t}={x}_{t-1}+\textbf{C}{ c }_{ t }+{e}_t\\ { e }_{ t }\sim MVN(0,\textbf{Q})\\ { y }_{ t }=\textbf{Z}{ x }_{ t }+{ v }_{ t }\\ { v }_{ t }\sim MVN(0,\textbf{R})\] where the matrix \(\textbf{C}\) represents the number of trends by number of covariates at time \(t\). For a single trend, this would mean estimating \(K\) parameters, where \(K\) is the number of trends. For a model including 2 covariates, the number of estimated coefficients would be \(2K\) and so forth.
We’ll start by simulating some random trends using the sim_dat
function,
Next, we can add a covariate effect to the trend estimate, x
. For example,
cov = expand.grid("time"=1:20, "timeseries"=1:4, "covariate"=1)
cov$value = rnorm(nrow(cov),0,0.1)
for(i in 1:nrow(cov)) {
sim_dat$y[cov$timeseries[i],cov$time[i]] = sim_dat$pred[cov$timeseries[i],cov$time[i]] + c(0.1,0.2,0.3,0.4)[cov$timeseries[i]]*cov$value[i]
}
And now fit the model with fit_dfa
##
## SAMPLING FOR MODEL 'dfa' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 6e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.6 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 300 [ 0%] (Warmup)
## Chain 1: Iteration: 30 / 300 [ 10%] (Warmup)
## Chain 1: Iteration: 60 / 300 [ 20%] (Warmup)
## Chain 1: Iteration: 90 / 300 [ 30%] (Warmup)
## Chain 1: Iteration: 120 / 300 [ 40%] (Warmup)
## Chain 1: Iteration: 150 / 300 [ 50%] (Warmup)
## Chain 1: Iteration: 151 / 300 [ 50%] (Sampling)
## Chain 1: Iteration: 180 / 300 [ 60%] (Sampling)
## Chain 1: Iteration: 210 / 300 [ 70%] (Sampling)
## Chain 1: Iteration: 240 / 300 [ 80%] (Sampling)
## Chain 1: Iteration: 270 / 300 [ 90%] (Sampling)
## Chain 1: Iteration: 300 / 300 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 8.41563 seconds (Warm-up)
## Chain 1: 10.5218 seconds (Sampling)
## Chain 1: 18.9374 seconds (Total)
## Chain 1:
## Inference for the input samples (1 chains: each with iter=150; warmup=75):
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## x[1,1] 1.0 0.0 0.3 0.4 0.7 1.0 1.2 1.7 56 1.0
## x[1,2] 3.1 0.1 1.1 1.4 2.3 3.2 3.8 5.5 56 1.0
## x[1,3] 4.3 0.2 1.5 2.0 3.1 4.3 5.3 7.6 56 1.0
## x[1,4] 2.5 0.1 0.9 1.2 1.9 2.6 3.2 4.5 56 1.0
## x[1,5] 0.9 0.0 0.3 0.4 0.6 0.9 1.1 1.6 56 1.0
## x[1,6] 1.1 0.1 0.4 0.5 0.8 1.1 1.4 2.0 57 1.0
## x[1,7] -0.1 0.0 0.0 -0.2 -0.1 -0.1 -0.1 -0.1 53 1.0
## x[1,8] 1.5 0.1 0.5 0.7 1.1 1.5 1.9 2.7 57 1.0
## x[1,9] -0.3 0.0 0.1 -0.5 -0.3 -0.3 -0.2 -0.1 55 1.0
## x[1,10] -1.4 0.1 0.5 -2.5 -1.8 -1.4 -1.0 -0.7 56 1.0
## x[1,11] -0.1 0.0 0.0 -0.1 -0.1 -0.1 -0.1 0.0 66 1.0
## x[1,12] 0.6 0.0 0.2 0.3 0.4 0.6 0.7 1.0 57 1.0
## x[1,13] -0.7 0.0 0.2 -1.3 -0.9 -0.7 -0.5 -0.3 58 1.0
## x[1,14] -0.6 0.0 0.2 -1.1 -0.8 -0.6 -0.5 -0.3 59 1.0
## x[1,15] -1.4 0.1 0.5 -2.5 -1.8 -1.4 -1.0 -0.7 56 1.0
## x[1,16] -0.5 0.0 0.2 -0.9 -0.6 -0.5 -0.3 -0.2 57 1.0
## x[1,17] -0.7 0.0 0.2 -1.3 -0.9 -0.7 -0.5 -0.3 56 1.0
## x[1,18] -2.9 0.1 1.0 -5.2 -3.7 -3.0 -2.2 -1.4 56 1.0
## x[1,19] -3.1 0.1 1.1 -5.5 -3.9 -3.2 -2.3 -1.4 57 1.0
## x[1,20] -3.3 0.2 1.1 -5.8 -4.1 -3.3 -2.4 -1.5 56 1.0
## Z[1,1] 0.6 0.0 0.2 0.3 0.4 0.5 0.7 1.1 49 1.0
## Z[2,1] -0.6 0.0 0.2 -1.1 -0.7 -0.5 -0.4 -0.3 49 1.0
## Z[3,1] 0.6 0.0 0.2 0.3 0.4 0.5 0.7 1.1 49 1.0
## Z[4,1] -0.6 0.0 0.2 -1.1 -0.7 -0.5 -0.4 -0.3 49 1.0
## sigma[1] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 47 1.0
## log_lik[1] 3.4 0.0 0.3 2.8 3.3 3.5 3.6 3.8 79 1.0
## log_lik[2] 2.6 0.1 0.7 0.9 2.3 2.6 3.1 3.5 49 1.0
## log_lik[3] 3.5 0.0 0.2 3.1 3.5 3.6 3.7 3.8 69 1.0
## log_lik[4] 3.3 0.1 0.6 1.4 3.1 3.5 3.6 3.8 42 1.0
## log_lik[5] 3.4 0.0 0.3 2.3 3.3 3.5 3.7 3.8 55 1.0
## log_lik[6] 2.7 0.1 0.7 1.0 2.4 2.8 3.2 3.6 54 1.0
## log_lik[7] 3.5 0.0 0.2 2.9 3.4 3.6 3.7 3.8 39 1.1
## log_lik[8] 3.2 0.1 0.4 2.1 3.1 3.4 3.5 3.7 51 1.0
## log_lik[9] 3.4 0.1 0.5 2.5 3.2 3.5 3.6 3.8 73 1.0
## log_lik[10] 2.4 0.1 1.0 -0.1 2.0 2.5 3.0 3.7 126 1.0
## log_lik[11] 3.4 0.0 0.3 2.6 3.4 3.5 3.6 3.8 78 1.0
## log_lik[12] 3.1 0.1 0.6 1.3 2.7 3.3 3.5 3.7 85 1.0
## log_lik[13] 3.3 0.1 0.4 2.0 3.2 3.4 3.6 3.8 44 1.0
## log_lik[14] 2.8 0.1 0.8 1.3 2.5 3.0 3.4 3.7 104 1.0
## log_lik[15] 3.5 0.1 0.3 2.8 3.4 3.6 3.7 3.8 26 1.1
## log_lik[16] 3.2 0.1 0.6 1.4 3.1 3.4 3.5 3.7 36 1.0
## log_lik[17] 3.3 0.1 0.4 2.3 3.1 3.4 3.6 3.8 72 1.0
## log_lik[18] 2.5 0.1 0.9 0.6 1.9 2.8 3.1 3.6 88 1.0
## log_lik[19] 3.5 0.0 0.3 2.9 3.4 3.6 3.6 3.7 65 1.0
## log_lik[20] 3.2 0.0 0.5 2.0 3.0 3.3 3.5 3.7 128 1.0
## log_lik[21] 3.4 0.0 0.3 2.7 3.3 3.5 3.6 3.7 59 1.0
## log_lik[22] 2.2 0.1 0.8 0.4 1.8 2.3 2.8 3.3 88 1.0
## log_lik[23] 3.6 0.0 0.2 3.2 3.5 3.6 3.7 3.8 51 1.0
## log_lik[24] 3.2 0.1 0.4 2.4 3.1 3.3 3.5 3.7 44 1.1
## log_lik[25] 3.4 0.0 0.3 2.9 3.3 3.5 3.6 3.8 62 1.0
## log_lik[26] 2.6 0.1 0.8 0.7 2.1 2.8 3.1 3.5 89 1.0
## log_lik[27] 3.5 0.0 0.2 3.2 3.4 3.6 3.6 3.8 61 1.0
## log_lik[28] 3.3 0.0 0.4 2.1 3.1 3.4 3.6 3.8 99 1.0
## log_lik[29] 3.4 0.0 0.4 2.4 3.2 3.5 3.6 3.7 68 1.0
## log_lik[30] 2.8 0.1 0.8 0.3 2.6 3.0 3.4 3.6 84 1.0
## log_lik[31] 3.5 0.0 0.3 2.7 3.4 3.6 3.6 3.8 58 1.0
## log_lik[32] 3.2 0.1 0.5 2.0 3.0 3.3 3.5 3.7 66 1.0
## log_lik[33] 3.4 0.1 0.4 2.2 3.3 3.5 3.6 3.8 39 1.0
## log_lik[34] 2.6 0.1 0.7 0.8 2.3 2.7 3.1 3.5 89 1.0
## log_lik[35] 3.5 0.0 0.2 3.0 3.5 3.6 3.7 3.8 51 1.0
## log_lik[36] 3.1 0.1 0.6 1.8 2.7 3.3 3.5 3.7 70 1.0
## log_lik[37] 3.3 0.1 0.4 2.3 3.0 3.4 3.6 3.7 72 1.0
## log_lik[38] 2.5 0.1 0.9 -0.1 2.0 2.6 3.3 3.5 60 1.0
## log_lik[39] 3.5 0.0 0.2 2.9 3.4 3.5 3.6 3.8 62 1.0
## log_lik[40] 3.2 0.1 0.5 2.1 2.9 3.3 3.5 3.7 81 1.0
## log_lik[41] 3.4 0.0 0.3 2.8 3.3 3.5 3.6 3.8 63 1.0
## log_lik[42] 2.8 0.1 0.7 1.3 2.5 2.9 3.2 3.6 78 1.0
## log_lik[43] 3.6 0.0 0.2 3.1 3.5 3.6 3.7 3.8 51 1.0
## log_lik[44] 3.2 0.1 0.6 1.4 3.0 3.3 3.5 3.7 56 1.0
## log_lik[45] 3.4 0.0 0.3 2.8 3.2 3.5 3.6 3.8 65 1.1
## log_lik[46] 2.2 0.1 0.8 0.5 1.6 2.4 2.9 3.5 72 1.0
## log_lik[47] 3.5 0.0 0.2 3.1 3.5 3.6 3.7 3.8 61 1.0
## log_lik[48] 3.2 0.1 0.5 2.0 3.0 3.4 3.5 3.7 82 1.1
## log_lik[49] 3.4 0.0 0.3 2.8 3.2 3.5 3.6 3.8 76 1.0
## log_lik[50] 2.6 0.1 0.7 1.1 2.3 2.8 3.2 3.6 86 1.0
## log_lik[51] 3.5 0.0 0.2 3.0 3.4 3.6 3.6 3.8 75 1.0
## log_lik[52] 3.2 0.0 0.4 2.3 3.0 3.3 3.5 3.7 93 1.0
## log_lik[53] 3.4 0.0 0.3 2.5 3.2 3.5 3.6 3.8 60 1.0
## log_lik[54] 2.9 0.1 0.7 1.3 2.6 3.2 3.4 3.7 189 1.0
## log_lik[55] 3.5 0.0 0.2 2.7 3.3 3.5 3.6 3.8 49 1.0
## log_lik[56] 3.4 0.0 0.4 2.4 3.3 3.5 3.6 3.7 71 1.0
## log_lik[57] 3.4 0.0 0.3 2.6 3.3 3.5 3.6 3.8 64 1.0
## log_lik[58] 2.7 0.1 0.8 1.0 2.3 2.9 3.3 3.6 115 1.0
## log_lik[59] 3.4 0.0 0.4 2.4 3.3 3.5 3.6 3.8 65 1.0
## log_lik[60] 3.3 0.1 0.5 2.1 3.1 3.4 3.6 3.7 92 1.0
## log_lik[61] 3.4 0.0 0.4 2.5 3.1 3.5 3.6 3.7 59 1.0
## log_lik[62] 2.3 0.1 0.8 0.9 1.7 2.4 3.1 3.5 89 1.0
## log_lik[63] 3.5 0.0 0.2 3.0 3.4 3.5 3.7 3.8 38 1.0
## log_lik[64] 3.1 0.1 0.5 1.9 2.9 3.2 3.5 3.7 61 1.0
## log_lik[65] 3.5 0.0 0.3 2.7 3.4 3.5 3.7 3.8 49 1.0
## log_lik[66] 2.8 0.1 0.6 1.4 2.4 2.9 3.2 3.6 68 1.0
## log_lik[67] 3.6 0.0 0.2 3.1 3.5 3.6 3.7 3.8 54 1.0
## log_lik[68] 3.1 0.1 0.6 1.8 2.9 3.3 3.6 3.7 61 1.0
## log_lik[69] 3.3 0.0 0.3 2.5 3.1 3.3 3.5 3.7 75 1.0
## log_lik[70] 2.7 0.1 0.7 1.0 2.3 2.9 3.3 3.6 95 1.0
## log_lik[71] 3.5 0.0 0.3 2.6 3.4 3.5 3.6 3.8 55 1.0
## log_lik[72] 3.1 0.1 0.5 2.2 2.8 3.2 3.5 3.6 69 1.0
## log_lik[73] 3.2 0.1 0.7 1.6 3.0 3.4 3.5 3.7 72 1.0
## log_lik[74] 2.5 0.1 0.9 0.7 2.0 2.7 3.2 3.6 81 1.0
## log_lik[75] 3.5 0.1 0.3 2.5 3.4 3.6 3.7 3.8 37 1.0
## log_lik[76] 3.1 0.1 0.6 1.6 2.8 3.3 3.5 3.7 70 1.0
## log_lik[77] 3.3 0.0 0.4 2.3 3.2 3.4 3.6 3.7 97 1.0
## log_lik[78] 2.8 0.1 0.9 0.5 2.4 3.2 3.4 3.6 113 1.0
## log_lik[79] 3.5 0.0 0.3 2.7 3.4 3.5 3.6 3.8 63 1.0
## log_lik[80] 3.2 0.0 0.5 1.9 3.0 3.4 3.5 3.7 133 1.0
## psi[1] 2.1 0.1 1.3 0.5 1.2 1.8 2.7 6.1 153 1.0
## b_obs[1,1] 0.2 0.0 0.0 0.1 0.2 0.2 0.2 0.2 37 1.1
## b_obs[1,2] 1.5 0.0 0.0 1.5 1.5 1.5 1.5 1.5 75 1.0
## b_obs[1,3] 0.2 0.0 0.0 0.2 0.2 0.2 0.3 0.3 64 1.0
## b_obs[1,4] 0.3 0.0 0.0 0.2 0.2 0.3 0.3 0.3 59 1.0
## lp__ 237.5 0.8 4.3 227.7 234.6 238.2 240.6 244.9 26 1.0
##
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
We can then make plots of the true and estimated trend,
This approach could be modified to have covariates not affecting some time series. For example, if we didn’t want the covariate to affect the last time series, we could say
And then again fit the model
As a cautionary note, there’s some identifiability issues with including covariates in the process model. Covariates need to be standardized or centered prior to being included. Future versions of this vignette will include more clear examples and recommendations.
We’ll start by simulating some random trends using the sim_dat
function,
Next, we can add a covariate effect to the trend estimate, x
. For example,
cov = rnorm(20, 0, 1)
b_pro = c(0.1,0.3)
x = matrix(0,2,20)
for(i in 1:2) {
x[i,1] = cov[1]*b_pro[i]
}
for(i in 2:length(cov)) {
x[1,i] = x[1,i-1] + cov[i]*b_pro[1] + rnorm(1,0,1)
x[2,i] = x[2,i-1] + cov[i]*b_pro[2] + rnorm(1,0,1)
}
y = sim_dat$Z %*% x
And now fit the model with fit_dfa