Overview

Spatial Prediction cares about making models based on spatially distributed data with respect to time. It includes data structures, and skeletons of models in order to make it easy for users to quickly prototype new models on data in a wide variety of formats.

Data

Suppose we want to use the following data to model rainfall over time in the various regions given below.

data("bomregions")
matrix_data = t(bomregions[1:8,c(10:17)])
colnames(matrix_data) = bomregions[1:8,1]
matrix_data
##             1900   1901   1902   1903   1904   1905   1906   1907
## eastRain  429.98 500.12 315.33 694.09 564.86 443.11 735.26 585.13
## seRain    603.39 510.89 420.77 628.07 550.98 583.05 677.96 509.02
## southRain 375.39 314.01 283.64 420.83 388.11 325.70 422.12 357.48
## swRain    738.28 558.98 541.85 729.44 711.39 717.54 634.15 709.77
## westRain  399.90 323.07 362.57 377.11 417.96 253.67 336.66 339.78
## northRain 360.29 475.92 344.86 601.27 603.84 312.80 538.66 530.85
## mdbRain   412.67 364.65 255.85 524.88 448.40 427.93 610.43 436.35
## auRain    368.73 401.72 317.18 518.59 504.65 320.35 485.28 451.01

and suppose that we want to model the rainfall in the different regions over time.

Example 1: Moving Average Models

A relatively simple prediction model for space time data, is to predict the result for a location at time t to be the average of the values of that location at k previous times. We will walk through the process of creating such a model.

Data

For this model, all the data we need fits nicely into a space-time Matrix.

SpaceTimeMatrix <- IncidenceMatrix$new(data=matrix_data)

The class IncidenceMatrix is an R6 class, and holds our data as well as several other pieces of information. For a more detailed look into R6 classes, and how they differ from other ways of using R, see vignette('R6 Classes',ForecastFramework) or the R6 package.

Model

For our model, we start by picking some model parameters. For example, we can choose the number of time steps to go back windowSize. Now lets walk through how to prototype this model:

MovingAverageModel <- R6Class(
    inherit = Model,

We start by defining our new modeling class, MovingAverageModel We want this to be recognized as a model by other code using ForecastFramework, so we so we have it inherit from Model, one of the package classes. The Model class defines some mandatory functions we need to extend. These can be found by referencing the help page ?Model, or by consulting the vignette('ClassDiagram','ForecastFramework').

  private = list(
    window = NA,
    windowSize = 0
  )

The private section of the class is for aspects of the class we do not want the user to access directly. These should be elements of the class which are called by other methods, and not directly. Both methods (functions), and variables can be stored here.

    public = list(

The public section of the class is for anything the user should have direct access to.

        #' @method initialize Create a new Moving Average Model
        #' @param windowSize The size of window over which to do the moving average
        initialize = function(windowSize = 1){
            private$window = as.integer(1:windowSize)
            private$windowSize = as.integer(windowSize)
      }

The first method we modify is the initialization method. This method gets called whenever we instantiate a MovingAverageModel with MovingAverageModel$new(). See that we have passed windowSize to the model. Also note the two commented lines right before the method. These are documentation strings, and will be used to automatically document the class’ methods. This function takes the window size, and assigns the two variables defined in the private section. Note that we access them through the private keyword. When accessing members of the class which are public, we will use the self keyword in the same way.

    #' @method fit This function does nothing.
    #' @param data This is included for compliance with Model.R
    fit = function(data){
      #The model is always fitted
      return()
    },

The fit method is supposed to take a data set, and use it to automatically configure model parameters. However, our model is simple enough that no parameter modification is necessary. So, we have the function do nothing.

        #' @method predict Predict as much as possible from the new data
        #' @param newdata The data to use when making predictions.
        predict = function(newdata){
          if(!('MatrixData' %in% class(newdata))){
            stop("This operation requires matrix-like data")
          }
          # for debugging: see AbstractClasses::Generic::debug for details.
          if('predict' %in% private$.debug){
            browser()
          }
          #This is an R6 specific thing.  We want to clone arguments before we use them, because
          #  otherwise we will also modify the original.
          newdata = newdata$clone(TRUE)
          #We'll add a column so the lagging works out correctly.
          #We can think of this as adding the prediction column to the data.
          newdata$addColumns(1)
          #We pre-allocate our return as another as a subclass of the Forecast class.
          #SimulatedIncidenceMatrix is the only one currently defined, so...
          #We're not doing anything stochastic, so we set the number of simulations to 1.
          rc = SimulatedIncidenceMatrix$new(newdata,nsim=1)
          #First we use the lag function of IncidenceMatrix to reogranize the data.
          #We want to use every row at every time step in the window.  The lag function allows us
          #to create new rows for these previous time steps.
          newdata$lag(private$window)
          #Since we added new rows, in particular for each old row we added private$windowSize new rows
          #  we divide.
          for(row in 1:newdata$nrow/private$windowSize){
            #The seq call is the index of all rows corresponding to the original row `row`
            #In other words, we take the mean of all rows corresponding to lagged versions of the original row
            #2 means take the mean of each column
            rc$mutate(rows=row,data=apply(newdata$mat[seq(row,newdata$nrow,private$windowSize),],2,mean))
          }
          return(IncidenceForecast$new(data=rc))
        }

predict is the main function of the class. We first introduce extra rows to cover the whole window. Then we take the average over the window for each row, and store that in the appropriate part of our output. Notice here that the IncidenceMatrix class does most of the work, and we had very little manual steps. Ideally, most of the prediction process is actions taken by the data on itself. This allows us to make as few assumptions about the data as possible.

MovingAverageModel <- R6Class(
    classname = "MovingAverageModel",
    inherit = Model,
    private = list(
    window = NA,
    windowSize = 0
  ),
    public = list(
        initialize = function(windowSize = 1){
            private$window = as.integer(1:windowSize)
            private$windowSize = as.integer(windowSize)
      },
        #' @method fit This function does nothing.
    #' @param data This is included for compliance with Model.R
        fit = function(data){
      #This model is always fitted.
      return()
    },
        #' @method predict Predict as much as possible from the new data
        #' @param newdata The data to use when making predictions.
        predict = function(newdata){
          if(!('MatrixData' %in% class(newdata))){
            stop("This operation requires matrix-like data")
          }
          # for debugging: see AbstractClasses::Generic::debug for details.
          if('predict' %in% private$.debug){
            browser()
          }
          #We'll add a column so the lagging works out correctly.
          #We can think of this as adding the prediction column to the data.
          newdata$addColumns(1)
          #We pre-allocate our return as another as a subclass of the Forecast class.
          #SimulatedIncidenceMatrix is the only one currently defined, so...
          #We're not doing anything stochastic, so we set the number of simulations to 1.
          rc = SimulatedIncidenceMatrix$new(newdata,nsim=1)
          #First we use the lag function of IncidenceMatrix to reogranize the data.
          #We want to use every row at every time step in the window.  The lag function allows us
          #to create new rows for these previous time steps.
          newdata$lag(private$window)
          #Since we added new rows, in particular for each old row we added private$windowSize new rows
          #  we divide.
          for(row in 1:newdata$nrow/private$windowSize){
            #The seq call is the index of all rows corresponding to the original row `row`
            #In other words, we take the mean of all rows corresponding to lagged versions of the original row
            #2 means take the mean of each column
            rc$mutate(rows=row,data=apply(newdata$mat[seq(row,newdata$nrow,private$windowSize),],2,mean))
          }
          return(IncidenceForecast$new(data=rc,forecastTimes=sapply(1:rc$ncol,function(x){return(TRUE)})))
        }
    )
)
TheModel <- MovingAverageModel$new(windowSize=3)
 TheModel$fit(data=SpaceTimeMatrix$subset(cols=-SpaceTimeMatrix$ncol,mutate=FALSE))
## NULL
 TheModel$predict(newdata=SpaceTimeMatrix$subset(cols=-SpaceTimeMatrix$ncol,mutate=FALSE))$mean()$mat
##      1900 1901 1902     1903     1904     1905     1906       NA
## 1900   NA   NA   NA 385.5487 479.8575 484.6150 510.5337 516.1075
## 1900   NA   NA   NA 433.9225 447.0025 451.0900 512.1788 482.8813
## 1900   NA   NA   NA 436.5057 424.1557 531.9571 481.1900 502.6143
## 1900   NA   NA   NA 395.5800 449.2529 473.1514 520.1657 484.8000
## 1900   NA   NA   NA 435.8014 421.1357 436.8200 502.0543 455.0129
## 1900   NA   NA   NA 461.9833 424.7100 555.9317 507.1050 516.0300
## 1900   NA   NA   NA 371.2017 402.5550 433.4450 487.2700 459.9083
## 1900   NA   NA   NA 448.0067 428.4733 439.9633 543.4517 474.7383

The final output is a matrix with 5 columns, and 3 rows. The rows correspond to the rows of our original matrix. Notice that we predicted on only the first 4 columns of SpaceTimeMatrix, however the output has 5 columns. The first 4 columns represent the same time steps as the first 4 columns of our original matrix, and the 5th column represents the time after the 4th column, which is the same time as the 5th column we chose not to predict on. The first three columns are NA, because in order to compute those values, we need values not present in the current matrix. To see how accurate we were, lets compare our results to the true values.

You may have also noticed that in order to extract the matrix, we use TheModel$predict(...)$mean()$mat. This is because the prediction is its own object. If we print it to screen, we see:

prediction <- TheModel$predict(newdata=SpaceTimeMatrix$subset(cols=-SpaceTimeMatrix$ncol,mutate=FALSE))
prediction
## <IncidenceForecast>
##   Inherits from: <SimulatedForecast>
##   Public:
##     binDist: function (cutoffs, include.lowest = FALSE, right = TRUE) 
##     clone: function (deep = FALSE) 
##     data: active binding
##     debug: function (string) 
##     forecastMadeTime: active binding
##     forecastTimes: active binding
##     initialize: function (data = SimulatedIncidenceMatrix$new(), forecastTimes = c()) 
##     mean: function (trim = 0, na.rm = FALSE) 
##     median: function (na.rm = FALSE) 
##     model: active binding
##     nsim: active binding
##     quantile: function (probs, na.rm = FALSE, names = TRUE, type = 7) 
##     sample: active binding
##     undebug: function (string) 
##   Private:
##     .data: SimulatedIncidenceMatrix, AbstractSimulatedIncidenceMatrix, ArrayData, MatrixData, DataContainer, Generic, R6
##     .debug: 
##     .forecastMadeTime: 2020-01-16 09:05:19
##     .forecastTimes: TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
##     .model: Generic, R6
##     .nsim: 0
##     checkType: function (name, val, type = "self") 
##     defaultAbstract: function (...) 
##     defaultActive: function (name, type, value)

We see that this object is another R6 class, IncidenceForecast containing information about when the prediction was created, which columns are predicted, and other information. We can use some of the features of our IncidenceForecast to calculate. For example, here is the absolute error of the mean prediction.

truth <- SpaceTimeMatrix
abs_error <- abs(prediction$mean()$mat-truth$mat)
abs_error
##      1900 1901 1902      1903       1904     1905      1906        NA
## 1900   NA   NA   NA 308.54125  85.002500  41.5050 224.72625  69.02250
## 1900   NA   NA   NA 194.14750 103.977500 131.9600 165.78125  26.13875
## 1900   NA   NA   NA  15.67571  36.045714 206.2571  59.07000 145.13429
## 1900   NA   NA   NA 333.86000 262.137143 244.3886 113.98429 224.97000
## 1900   NA   NA   NA  58.69143   3.175714 183.1500 165.39429 115.23286
## 1900   NA   NA   NA 139.28667 179.130000 243.1317  31.55500  14.82000
## 1900   NA   NA   NA 153.67833  45.845000   5.5150 123.16000  23.55833
## 1900   NA   NA   NA  70.58333  76.176667 119.6133  58.17167  23.72833