Superfast Likelihood Inference for Stationary Gaussian Time Series

Yun Ling, Martin Lysy

2020-02-27

This vignette illustrates the basic functionality of the SuperGauss package by simulating a few stochastic processes and estimating their parameters from regularly spaced data.

Simulation of fractional Brownian motion

A one-dimensional fractional Brownian motion (fBM) \(X_t = X(t)\) is a continuous Gaussian process with \(E[X_t] = 0\) and \(\mathrm{cov}(X_t, X_s) = \tfrac 1 2 (|t|^{2H} + |s|^{2H} - |t-s|^{2H})\), for \(0 < H < 1\). fBM is not stationary but has stationary increments, such that \((X_{t+\Delta t} - X_t) \stackrel{D}{=} (X_{s+\Delta t} - X_s)\) for any \(s,t\). As such, its covariance function is completely determined its mean squared displacement (MSD) \[ \mathrm{\scriptsize MSD}_X(t) = E[(X_t - X_0)^2] = |t|^{2H}. \] When the Hurst parameter \(H = \tfrac 1 2\), fBM reduces to ordinary Brownian motion.

The following R code generates 5 independent fBM realizations of length \(N = 3000\) with \(H = 0.3\). The timing of the “superfast” method (Wood and Chan 1994) provided in this package is compared to that of a “fast” method (e.g., Brockwell and Davis 1991) and to the usual method (Cholesky decomposition of an unstructured variance matrix).

require(SuperGauss)

N <- 3000 # number of observations
dT <- 1/60 # time between observations (seconds)
H <- .3 # Hurst parameter

tseq <- (0:N)*dT # times at which to sample fBM
npaths <- 5 # number of fBM paths to generate

# to generate fbm, generate its increments, which are stationary
msd <- fbm.msd(tseq = tseq[-1], H = H)
acf <- msd2acf(msd = msd) # convert msd to acf

# superfast method
system.time({
  dX <- rSnorm(n = npaths, acf = acf, fft = TRUE)
})
##    user  system elapsed 
##   0.006   0.000   0.007
# fast method (about 3x as slow)
system.time({
  rSnorm(n = npaths, acf = acf, fft = FALSE)
})
##    user  system elapsed 
##   0.019   0.000   0.019
# unstructured variance method (much slower)
system.time({
  matrix(rnorm(N*npaths), npaths, N) %*% chol(toeplitz(acf))
})
##    user  system elapsed 
##   3.869   0.024   3.897
# convert increments to position measurements
Xt <- apply(rbind(0, dX), 2, cumsum)

# plot
clrs <- c("black", "red", "blue", "orange", "green2")
par(mar = c(4.1,4.1,.5,.5))
plot(0, type = "n", xlim = range(tseq), ylim = range(Xt),
     xlab = "Time (s)", ylab = "Position (m)")
for(ii in 1:npaths) {
  lines(tseq, Xt[,ii], col = clrs[ii], lwd = 2)
}

Inference for the Hurst parameter

Suppose that \(\boldsymbol{X}= (N_{X},\ldots,N_{)}\) are equally spaced observations of an fBM process with \(X_i = X(i \Delta t)\), and let \(\Delta\boldsymbol{X}= (N-1_{\Delta X},\ldots,N-1_{)}\) denote the corresponding increments, \(\Delta X_i = X_{i+1} - X_i\). Then the loglikelihood function for \(H\) is \[ \ell(H \mid \Delta\boldsymbol{X}) = -\tfrac 1 2 \big(\Delta\boldsymbol{X}' \boldsymbol{V}_H^{-1} \Delta\boldsymbol{X}+ \log |\boldsymbol{V}_H|\big), \] where \(V_H\) is a Toeplitz matrix, \[ \boldsymbol{V}_H= [\mathrm{cov}(\Delta X_i, \Delta X_j)]_{0 \le i,j < N} = \begin{bmatrix} \gamma_0 & \gamma_1 & \cdots & \gamma_{N-1} \\ \gamma_1 & \gamma_0 & \cdots & \gamma_{N-2} \\ \vdots & \vdots & \ddots & \vdots \\ \gamma_{N-1} & \gamma_{N-2} & \cdots & \gamma_0 \end{bmatrix}. \] Thus, each evaluation of the loglikelihood requires the inverse and log-determinant of a Toeplitz matrix, which scales as \(\mathcal O(N^2)\) with the Durbin-Levinson algorithm. The SuperGauss package implements an extended version of the Generalized Schur algorithm of Ammar and Gragg (1988), which scales these computations as \(\mathcal O(N \log^2 N)\). With careful memory management and extensive use of the FFTW library (Frigo and Johnson 2005), the SuperGauss implementation crosses over Durbin-Levinson at around \(N = 300\).

The Toeplitz matrix class

The bulk of the likelihood calculations in SuperGauss are handled by the Toeplitz matrix class. A Toeplitz object is created as follows:

# allocate and assign in one step
Toep <- Toeplitz(acf = acf)
Toep
## Toeplitz matrix of size 3000 
##  acf:  0.0857 -0.0208 -0.00421 -0.00228 -0.0015 -0.00109 ...
# allocate memory only
Toep <- Toeplitz(n = N)
Toep
## Toeplitz matrix of size 3000 
##  acf:  NULL
Toep$setAcf(acf = acf) # assign later

Its primary methods are illustrated below:

all(acf == Toep$getAcf()) # extract acf
## [1] TRUE
# matrix multiplication
z <- rnorm(N)
x1 <- toeplitz(acf) %*% z # regular way
x2 <- Toep %*% z # with Toeplitz class
range(x1-x2)
## [1] -1.332268e-15  1.554312e-15
# system of equations
y1 <- solve(toeplitz(acf), z) # regular way
y2 <- solve(Toep, z) # with Toeplitz class
range(y1-y2)
## [1] -2.160050e-12  1.236344e-12
# log-determinant
ld1 <- determinant(toeplitz(acf))$mod
ld2 <- determinant(Toep) # note: no $mod
c(ld1, ld2)
## [1] -7642.578 -7642.578

Maximum likelihood calculation

The following code shows how to obtain the maximum likelihood of \(H\) and its standard error for a given fBM path. For speed comparisons, the optimization is done both using the superfast Generalized Schur algorithm and the fast Durbin-Levinson algorithm.

dX <- diff(Xt[,1]) # obtain the increments of a given path
N <- length(dX)

# autocorrelation of fBM increments
fbm.acf <- function(H) {
  msd <- fbm.msd(1:N*dT, H = H)
  msd2acf(msd)
}

# loglikelihood using generalized Schur algorithm
Toep <- Toeplitz(n = N) # pre-allocate memory
loglik.GS <- function(H) {
  Toep$setAcf(acf = fbm.acf(H))
  dSnorm(X = dX, acf = Toep, log = TRUE)
}

# loglikelihood using Durbin-Levinson algorithm
loglik.DL <- function(H) {
  dSnormDL(X = dX, acf = fbm.acf(H), log = TRUE)
}

# superfast method
system.time({
  GS.mle <- optimize(loglik.GS, interval = c(.01, .99), maximum = TRUE)
})
##    user  system elapsed 
##   0.024   0.001   0.026
# fast method (about 10x slower)
system.time({
  DL.mle <- optimize(loglik.DL, interval = c(.01, .99), maximum = TRUE)
})
##    user  system elapsed 
##   0.289   0.001   0.290
c(GS = GS.mle$max, DL = DL.mle$max)
##        GS        DL 
## 0.2986292 0.2986292
# standard error calculation
require(numDeriv)
## Loading required package: numDeriv
Hmle <- GS.mle$max
Hse <- -hessian(func = loglik.GS, x = Hmle) # observed Fisher Information
Hse <- sqrt(1/Hse[1])
c(mle = Hmle, se = Hse)
##         mle          se 
## 0.298629202 0.003332506

Caution with ReferenceClasses

In order to effectively manage memory in the underlying C++ code, the Toeplitz class is implemented using R’s “Reference Classes”. Among other things, this means that when a Toeplitz object is passed to a function, the function does not make a copy of it: all modifications to the object inside the object are reflected on the object outside the function as well, as in the following example:

T1 <- Toeplitz(n = N)
T2 <- T1 # shallow copy: both of these point to the same memory location

# affects both objects
T1$setAcf(fbm.acf(.5))
T1
## Toeplitz matrix of size 3000 
##  acf:  0.0167 0 1.73e-18 -3.47e-18 0 6.94e-18 ...
T2
## Toeplitz matrix of size 3000 
##  acf:  0.0167 0 1.73e-18 -3.47e-18 0 6.94e-18 ...
loglik <- function(H) {
  T1$setAcf(acf = fbm.acf(H))
  dSnorm(X = dX, acf = T1, log = TRUE)
}

# affects both objects
loglik(H = .3)
## [1] -467.3969
T1
## Toeplitz matrix of size 3000 
##  acf:  0.0857 -0.0208 -0.00421 -0.00228 -0.0015 -0.00109 ...
T2
## Toeplitz matrix of size 3000 
##  acf:  0.0857 -0.0208 -0.00421 -0.00228 -0.0015 -0.00109 ...

Superfast Newton-Raphson

In addition to the superfast algorithm for Gaussian likelihood evaluations , SuperGauss provides such algorithms for the loglikelihood gradient and Hessian functions, leading to superfast versions of many inference algorithms such as Newton-Raphson and Hamiltonian Monte Carlo. An example of the former is given below using the two-parameter exponential autocorrelation model \[ \mathrm{\scriptsize ACF}_X(t \mid \lambda, \sigma) = \sigma^2 \exp(- |t/\lambda|). \]

# autocorrelation function
exp.acf <- function(t, lambda, sigma) sigma^2 * exp(-abs(t/lambda))
# gradient, returned as a 2-column matrix
exp.acf.grad <- function(t, lambda, sigma) {
  ea <- exp.acf(t, lambda, 1)
  cbind(abs(t)*(sigma/lambda)^2 * ea, # d_acf/d_lambda
        2*sigma * ea) # d_acf/d_sigma
}
# Hessian, returned as an array of size length(t) x 2 x 2
exp.acf.hess <- function(t, lambda, sigma) {
  ea <- exp.acf(t, lambda, 1)
  sl2 <- sigma/lambda^2
  hess <- array(NA, dim = c(length(t), 2, 2))
  hess[,1,1] <- sl2^2*(t^2 - 2*abs(t)*lambda) * ea # d2_acf/d_lambda^2
  hess[,1,2] <- 2*sl2 * abs(t) * ea # d2_acf/(d_lambda d_sigma)
  hess[,2,1] <- hess[,1,2] # d2_acf/(d_sigma d_lambda)
  hess[,2,2] <- 2 * ea # d2_acf/d_sigma^2
  hess
}

# simulate data
lambda <- runif(1, .5, 2)
sigma <- runif(1, .5, 2)
tseq <- (1:N-1)*dT
acf <- exp.acf(t = tseq, lambda = lambda, sigma = sigma)
Xt <- rSnorm(acf = acf)

Toep <- Toeplitz(n = N) # storage space

# negative loglikelihood function of theta = (lambda, sigma)
# include attributes for gradient and Hessian
exp.negloglik <- function(theta) {
  lambda <- theta[1]
  sigma <- theta[2]
  # acf, its gradient, and Hessian
  Toep$setAcf(acf = exp.acf(tseq, lambda, sigma)) # use the Toeplitz class
  dacf <- exp.acf.grad(tseq, lambda, sigma)
  d2acf <- exp.acf.hess(tseq, lambda, sigma)
  nll <- -1 * dSnorm(X = Xt, acf = Toep, log = TRUE)
  attr(nll, "gradient") <- -1 * Snorm.grad(X = Xt, acf = Toep, dacf = dacf)
  attr(nll, "hessian") <- -1 * Snorm.hess(X = Xt, acf = Toep,
                                          dacf = dacf, d2acf = d2acf)
  nll
}

# optimization
system.time({
  mle.fit <- nlm(f = exp.negloglik, p = c(1,1), hessian = TRUE)
})
##    user  system elapsed 
##   0.433   0.047   0.482
# display estimates with standard errors
rbind(true = c(lambda = lambda, sigma = sigma),
      est = mle.fit$estimate,
      se = sqrt(diag(solve(mle.fit$hessian))))
##        lambda    sigma
## true 1.448131 1.676776
## est  1.472559 1.690344
## se   0.363845 0.207625

References

Ammar, G.S. and Gragg, W.B., 1988. Superfast solution of real positive definite Toeplitz systems. SIAM Journal on Matrix Analysis and Applications, 9 (1), 61–76.

Brockwell, P.J. and Davis, R.A., 1991. Time series: Theory and methods. Springer: New York.

Frigo, M. and Johnson, S.G., 2005. The design and implementation of FFTW3. Proceedings of the IEEE, 93 (2), 216–231.

Wood, A.T. and Chan, G., 1994. Simulation of stationary gaussian processes in \([0, 1]^d\). Journal of Computational and Graphical Statistics, 3 (4), 409–432.