While the point of zoon
is to run full workflows which are then reproducible, during the development of modules it can be useful to run individual modules in the same way you would run normal R functions.
It is not entirely simple to do this, so this vignette just clarifies how.
First, load packages. You need to explicitely load dismo
as we are now going to use dismo
functions outside of the zoon
environment.
library(dismo)
library(zoon)
This is the workflow
we will run. It might be worth running it here to make sure there are no problems.
w <- workflow(UKAnophelesPlumbeus, UKAir, OneHundredBackground, LogisticRegression, PrintMap)
## Occurrence data does not have a "crs" column, zoon will assume it is in the same projection as the covariate data
## There are fewer than 100 cells in the environmental raster.
## Using all available cells (81) instead
It’s worth noting that this is a simple workflow
. Chaining modules will be fairly easy but depends on the module type. Workflows using list()
are likely to not be easy.
Get the modules from the zoon
repository and load them into the working environmen:.
LoadModule('UKAnophelesPlumbeus')
## [1] "UKAnophelesPlumbeus"
LoadModule('UKAir')
## [1] "UKAir"
LoadModule('OneHundredBackground')
## [1] "OneHundredBackground"
LoadModule('LogisticRegression')
## [1] "LogisticRegression"
LoadModule('PrintMap')
## [1] "PrintMap"
Run the data modules. To chain occurrence modules, just rbind()
the resulting dataframes. To chain covariate modules, use raster::stack
to combine the covariate data.
oc <- UKAnophelesPlumbeus()
cov <- UKAir()
We have to run ExtractAndCombData()
. This combines the occurrence and raster data.
data <- ExtractAndCombData(oc, cov)
## Occurrence data does not have a "crs" column, zoon will assume it is in the same projection as the covariate data
Next, we run the process and model modules. To chain process models, simply run each in turn with the output of one going into the next. The simple way to run model modules is to use the module function as below. If crossvalidation is important then you need to run the modules slightly differently (see below).
proc <- OneHundredBackground(data)
## There are fewer than 100 cells in the environmental raster.
## Using all available cells (81) instead
mod <- LogisticRegression(proc$df)
Finally, combine some output into a list and run the output modules.
model <- list(model = mod, data = proc$df)
out <- PrintMap(model, cov)
Crossvalidation requires the modules to be run using the function RunModels()
which runs the model on each fold of the crossvalidating data and predicts the remaining data. It also runs a model and predicts any external validation data.
modCrossvalid <- RunModels(proc$df, 'LogisticRegression', list(), environment())
modelCrossvalid <- list(model = modCrossvalid$model, data = proc$df)
out <- PrintMap(modelCrossvalid, proc$ras)
As mentioned above, workflows using list()
are likely to not be easy, but then these aren’t particularly required while developing a package. To run workflows using list
, it would be best to use LoadModule
as above and then run through the workflow
source code interactively.