A lot of time series practitioners resort to the well-known Box-Jenkins methods, such as ARIMA and SARIMA, when modelling time series. Those methods are easily incorporated into the State Space framework. This provides to be beneficial, as missing observations are easily dealt with in the State Space framework. Moreover, no observations need to be discarded due to the differencing and introduced lagged variables. The loglikelihood is calculated in an exact manner! In this document, we show you how to estimate ARIMA and SARIMA models using statespacer. We reproduce some estimation results as in Box et al. (2015).
To showcase the estimation of ARIMA models, we make use of the sunspot.year
data, which contains yearly numbers of sunspots from 1700 to 1988. See ?sunspot.year
for details. We only use the data from 1770 to 1869, to stay in line with Box et al. (2015). We estimate an \(\text{ARIMA}(3, ~ 0, ~ 0)\) with deterministic level (or constant if you prefer) as follows:
# Load statespacer
library(statespacer)
# Load the dataset
library(datasets)
Data <- matrix(window(sunspot.year, start = 1770, end = 1869))
# Estimate the ARIMA model
fit <- statespacer(y = Data,
H_format = matrix(0),
local_level_ind = TRUE,
arima_list = list(c(3, 0, 0)),
format_level = matrix(0),
initial = c(0.5*log(var(Data)), 0, 0, 0),
verbose = TRUE)
#> Starting the optimisation procedure at: 2020-07-15 23:43:29
#> Parameter scaling:[1] 1 1 1 1
#> initial value 5.022561
#> iter 10 value 4.114363
#> final value 4.111163
#> converged
#> Finished the optimisation procedure at: 2020-07-15 23:43:29
#> Time difference of 0.14882493019104 secs
Note that we eliminate the observation error by setting its variance to 0, although it’s perfectly fine to include observation errors along with ARIMA models, as long as you watch out for identification issues of course. For details about specifying proper initial values, please see vignette("dictionary", "statespacer")
.
We obtain the following estimates:
# Coefficients of the ARMA component
arma_coeff <- rbind(
fit$system_matrices$AR$ARIMA1,
fit$standard_errors$AR$ARIMA1
)
arma_coeff <- cbind(
arma_coeff,
c(fit$smoothed$level[1],
sqrt(fit$system_matrices$Z_padded$level %*%
fit$smoothed$V[,,1] %*%
t(fit$system_matrices$Z_padded$level))
)
)
rownames(arma_coeff) <- c("coefficient", "std_error")
colnames(arma_coeff) <- c("ar1", "ar2", "ar3", "intercept")
arma_coeff
#> ar1 ar2 ar3 intercept
#> coefficient 1.55976415 -1.005462 0.2129622 48.605905
#> std_error 0.09962468 0.155982 0.1003591 6.358039
goodness_fit <- rbind(
fit$system_matrices$Q$ARIMA1,
fit$diagnostics$loglik,
fit$diagnostics$AIC
)
rownames(goodness_fit) <- c("Variance", "Loglikelihood", "AIC")
goodness_fit
#> [,1]
#> Variance 222.496532
#> Loglikelihood -411.116313
#> AIC 8.322326
We see that the results are fairly similar to the results as obtained by Box et al. (2015). Differences may occur due to the different estimation procedures. We don’t have to eliminate observations, so we use the full information available at hand, in contrast to traditional estimation procedures. Note that not much has to be done to estimate VARIMA models. In fact, you only need to specify a dependent variable y
that has more than one column! It’s also straightforward to add explanatory variables, by making use of the addvar_list
option, see vignette("seatbelt", "statespacer")
for an example of adding explanatory variables.
To showcase the estimation of SARIMA models, we make use of the classic AirPassengers
data, which contains monthly totals of international airline passengers from 1949 to 1960. See ?AirPassengers
for details. We estimate a \(\text{SARIMA}(0, ~ 1, ~ 1)_{1} ~ \times ~ (0, ~ 1, ~ 1)_{12}\). Note that in the multivariate case, there is a subtle difference between \(\text{SARIMA}(0, ~ 1, ~ 1)_{1} ~ \times ~ (0, ~ 1, ~ 1)_{12}\) and \(\text{SARIMA}(0, ~ 1, ~ 1)_{12} ~ \times ~ (0, ~ 1, ~ 1)_{1}\) as matrix multiplication is not commutative.
We proceed as follows:
# Load the dataset
Data <- matrix(log(AirPassengers))
# The SARIMA specification, must be a list containing lists!
sarima_list <- list(list(s = c(12, 1), ar = c(0, 0), i = c(1, 1), ma = c(1, 1)))
# Fit the SARIMA model
fit <- statespacer(y = Data,
H_format = matrix(0),
sarima_list = sarima_list,
initial = c(0.5*log(var(diff(Data))), 0, 0),
verbose = TRUE)
#> Starting the optimisation procedure at: 2020-07-15 23:43:29
#> Parameter scaling:[1] 1 1 1
#> initial value -1.034434
#> final value -1.616321
#> converged
#> Finished the optimisation procedure at: 2020-07-15 23:43:30
#> Time difference of 0.835811853408813 secs
We obtain the following estimates:
# Coefficients of the ARMA component
arma_coeff <- rbind(
c(fit$system_matrices$SMA$SARIMA1$S1, fit$system_matrices$SMA$SARIMA1$S12),
c(fit$standard_errors$SMA$SARIMA1$S1, fit$standard_errors$SMA$SARIMA1$S12)
)
rownames(arma_coeff) <- c("coefficient", "std_error")
colnames(arma_coeff) <- c("ma1 s = 1", "ma1 s = 12")
arma_coeff
#> ma1 s = 1 ma1 s = 12
#> coefficient -0.40188859 -0.55694248
#> std_error 0.08963614 0.07309788
goodness_fit <- rbind(
fit$system_matrices$Q$SARIMA1,
fit$diagnostics$loglik,
fit$diagnostics$AIC
)
rownames(goodness_fit) <- c("Variance", "Loglikelihood", "AIC")
goodness_fit
#> [,1]
#> Variance 0.001347882
#> Loglikelihood 232.750284785
#> AIC -3.010420622
As you can see, fitting the Box-Jenkins models with statespacer is quite easy!
Box, George EP, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. 2015. Time Series Analysis: Forecasting and Control. John Wiley & Sons.