The goal of sgmcmc
is to make it as easy as possible for users to run stochastic gradient MCMC (SGMCMC) algorithms. SGMCMC are algorithms which enable MCMC to scale more easily to large datasets, as traditional MCMC can run very slowly as dataset sizes grow.
sgmcmc
implements a lot of the popular stochastic gradient MCMC methods including SGLD, SGHMC and SGNHT. The package uses automatic differentiation, so all the differentiation needed for the methods is calculated automatically. Control variate methods can be used in order to improve the efficiency of the methods as proposed in the recent publication. This package is designed to be user friendly. In order to execute these algorithms, users only need to specify the data; the log-likelihood function and log-prior density; the parameter starting values; and a few tuning parameters.
To enable as much flexibility as possible, the data and parameter starting points fed to the functions in sgmcmc
are specified as lists. This allows users to specify multiple parameters and datasets. It also allows the user to easily reference these quantities in the log-likelihood function and log-prior density, and to set different stepsizes for different paramters (essential when parameters are on different scales).
As we mentioned earlier, the datasets you wish to use are specified as a list. Suppose we have datasets we have already obtained or created X
and Y
, we would specify the whole dataset for our session as
dataset = list("X" = X, "Y" = Y)
You can specify as many datasets as you like, the most important thing is that your naming is consistent, so you use the same names in your log-likelihood and log-prior functions, and in your list of stepsizes.
The functions assume that each observation is located on the first axis of the object. Suppose Y
is a 2d matrix, then the observation \(Y_i\) should be located at dataset$Y[i,]
. Similarly if X
is a 3d array, observation \(X_i\) should be located at dataset$X[i,,]
.
Again parameters are specified as a list, the names are what we will refer to them as in the logLik
and logPrior
functions introduced below, the values are the desired starting points. Suppose my model depends on two parameter vectors theta1
and theta2
, and we want to start both from 0. If we assume both are length 3, this could be specified like this
params = list("theta1" = rep(0, 3), "theta2" = rep(0, 3))
The log-likelihood function is specified as a function of the dataset
and params
, which have the same names as the lists you just specified. The only difference is that the objects inside the lists will have automatically been converted to TensorFlow objects for you. The dataset
list will contain TensorFlow placeholders. The params
list will contain TensorFlow variables. The logLik
function should be a function that takes these lists as input and returns what the log-likelihood value should be given the current parameter and data values. It should do this using TensorFlow operations. More details about how TensorFlow works in R
can be found here.
Specifying the logLik
and logPrior
functions regularly requires specifying specific distributions. TensorFlow
already has a number of distributions implemented in the TensorFlow Probability
package. All of the distributions implemented in TensorFlow Probability are located in tf$distributions
, a list is given on the TensorFlow Probability website. More complex distributions can be specified by coding up the logLik
and logPrior
functions by hand, examples of this, as well as using various distribution functions, are given in the other tutorials.
The logPrior
function is specified in exactly the same way except that the function should only take params
as input, as a prior should be independent of the dataset. You do not have to specify the prior at all, and this leads to the algorithm using a uniform, uninformative prior.
Suppose we want to simulate from the mean of a multivariate Normal density with each component of the mean having a Student-T prior, we would specify this as follows
library(MASS)
# Simulate and declare dataset
dataset = list("X" = mvrnorm(10^4, c(0, 0), diag(2)))
# Simulate random starting point
params = list("theta" = rnorm(2))
# Declare log likelihood
logLik = function(params, dataset) {
# Declare distribution, assuming Sigma known and constant
SigmaDiag = c( 1, 1 )
distn = tf$distributions$MultivariateNormalDiag(params$theta, SigmaDiag)
# Return sum of log pdf
return(tf$reduce_sum(distn$log_prob(dataset$X)))
}
# Declare log prior
logPrior = function(params) {
# Declare prior distribution
distn = tf$distributions$StudentT(3, 0, 1)
# Apply log prior componentwise and return sum
return(tf$reduce_sum(distn$log_prob(params$theta)))
}
Most of the time just specifying the constants in these functions, such as SigmaDiag
as R
objects will be fine. But occassionally there are issues when these constants get automatically converted to tf$float64
objects by TensorFlow
rather than tf$float32
. If you run into errors involving tf$float64
then force the constants to be input as tf$float32
by using SigmaDiag = tf$constant( c( 1, 1 ), dtype = tf$float32 )
.
The only other input that needs to be specified to set any of the standard stochastic gradient MCMC methods running is the stepsize
. This is normally defined as a list with an entry for each parameter name as follows
stepsize = list( "theta = 1e-5" )
A short hand is just to set stepsize = 1e-5
which just sets the stepsize of each parameter to be 1e-5
. This shorthand can be used for any of the tuning constants.
So to get sgld
working for the multivariate Normal log-likelihood function we specified, and an uninformative uniform prior, we can simply run
sgld( logLik, dataset, params, stepsize )
similarly for sghmc
and sgnht
.
To use the Student-t prior we specified, and set a minibatch size of 500
, we'd use
sgld( logLik, dataset, params, stepsize, logPrior = logPrior, minibatchSize = 500 )
For more details of the other optional arguments see the main documentation in the reference.
All of the control variate methods sgldcv
, sghmccv
and sgnhtcv
require an extra input optStepsize
, which is the stepsize tuning constant for the initial optimization to find the MAP estimates. This argument is simply a small numeric value, such as 0.1
, and may require some tuning, which we talk about below. To run sgldcv
on the multivariate Normal model, with specified prior we'd use
optStepsize = 0.1
sgldcv( logLik, dataset, params, stepsize, optStepsize, logPrior = logPrior, minibatchSize = 500 )
similar for sghmccv
and sgnhtcv
.
Most of the time, parameters need tuning, we suggest doing this using cross validation. You can roughly check the algorithm is converging by inspection by checking that the Log-posterior estimate
output by the algorithm settles down eventually (it should decrease at first unless the chain converges very quickly).
We suggest for more details you read the worked examples in the Articles section, these cover a variety of models (which will be expanded as the package matures):
The SGMCMC algorithms can also be run step by step, which allows custom storage of parameters using test functions, or sequential estimates. Useful if your chain is too large to fit into memory! This requires a better knowledge of TensorFlow. An example of this is given in the neural network vignette.
Full details of the API can be found here.