An important aspect of our efforts here is to allow models to automatically determine whether or not data is appropriate to use the model on. Ideally, we want our data to be as broadly applicable as is reasonable, regardless of its storage. We can accomplish this using polymorphism
, a way for different objects to have the same behaviours. Consider the following two sources of data:
data("Orange")
listFormData = ObservationList$new(
data=Orange
)
and
data("bomregions")
matrix_data = t(bomregions[1:8,c(10:17)])
matrixFormData = IncidenceMatrix$new(
data=matrix_data,
colData = list(
yr=bomregions[1:8,1]
)
)
If we want, we can think of the listFormData
as a sparse matrix representation to the matrixFormData
’s dense representation. If this is the case, we should be able to use listFormData
and matrixFormData
in the same way as matrices.
listFormData$formArray(
row='Tree',
col='age',
val='circumference'
)
listFormData$mat
## 118 484 664 1004 1231 1372 1582
## 3 30 51 75 108 115 139 140
## 1 30 58 87 115 120 142 145
## 5 30 49 81 125 142 174 177
## 2 33 69 111 156 172 203 203
## 4 32 62 112 167 179 209 214
matrixFormData$mat
## 1 2 3 4 5 6 7 8
## eastRain 429.98 500.12 315.33 694.09 564.86 443.11 735.26 585.13
## seRain 603.39 510.89 420.77 628.07 550.98 583.05 677.96 509.02
## southRain 375.39 314.01 283.64 420.83 388.11 325.70 422.12 357.48
## swRain 738.28 558.98 541.85 729.44 711.39 717.54 634.15 709.77
## westRain 399.90 323.07 362.57 377.11 417.96 253.67 336.66 339.78
## northRain 360.29 475.92 344.86 601.27 603.84 312.80 538.66 530.85
## mdbRain 412.67 364.65 255.85 524.88 448.40 427.93 610.43 436.35
## auRain 368.73 401.72 317.18 518.59 504.65 320.35 485.28 451.01
Notice they have the same matrix format. Moreover, because of the inheritance structure of the classes, we can use any model which takes MatrixData
input on either one.
model = MoveAheadModel$new()
model$fit(listFormData)
model$fit(matrixFormData)
model$predict(matrixFormData)$mean()$mat
## 2 3 4 5 6 7 8
## eastRain 429.98 500.12 315.33 694.09 564.86 443.11 735.26 585.13
## seRain 603.39 510.89 420.77 628.07 550.98 583.05 677.96 509.02
## southRain 375.39 314.01 283.64 420.83 388.11 325.70 422.12 357.48
## swRain 738.28 558.98 541.85 729.44 711.39 717.54 634.15 709.77
## westRain 399.90 323.07 362.57 377.11 417.96 253.67 336.66 339.78
## northRain 360.29 475.92 344.86 601.27 603.84 312.80 538.66 530.85
## mdbRain 412.67 364.65 255.85 524.88 448.40 427.93 610.43 436.35
## auRain 368.73 401.72 317.18 518.59 504.65 320.35 485.28 451.01
model$predict(listFormData)$mean()$mat
## 484 664 1004 1231 1372 1582
## 3 30 51 75 108 115 139 140
## 1 30 58 87 115 120 142 145
## 5 30 49 81 125 142 174 177
## 2 33 69 111 156 172 203 203
## 4 32 62 112 167 179 209 214
While it is nice to be able to use any sort of data when we do our modeling, for it to work well, our models need to be flexible. MoveAheadModel takes an MatrixData
object as input, which both IncidenceMatrix
, and ObservationList
inherit from. When choosing the class of input for your model, consider attempting to choose a Class as high on the inheritance tree (see vignette('ClassDiagram','ForecastFramework')
) as possible. The higher on the tree, the more data you will be able to pass to your model immediately. In a similar vein, when loading data, try to store it as low on the tree as possible, so that as many models will be able to use it as can be.
There are also various methods for converting from one class to another, so even if your data isn’t in the right format, you can modify it.
anotherMatrixFormData = IncidenceMatrix$new(listFormData)
anotherMatrixFormData$mat
## 118 484 664 1004 1231 1372 1582
## 3 30 51 75 108 115 139 140
## 1 30 58 87 115 120 142 145
## 5 30 49 81 125 142 174 177
## 2 33 69 111 156 172 203 203
## 4 32 62 112 167 179 209 214
listFormData$mat
## 118 484 664 1004 1231 1372 1582
## 3 30 51 75 108 115 139 140
## 1 30 58 87 115 120 142 145
## 5 30 49 81 125 142 174 177
## 2 33 69 111 156 172 203 203
## 4 32 62 112 167 179 209 214
Though the two objects are of different classes, they take the same data.