ENMeval
is an R package that performs automated runs and evaluations of ecological niche models, and currently only implements Maxent. ENMeval
was made for those who want to “tune” their models to maximize predictive ability and avoid overfitting, or in other words, optimize model complexity to balance goodness-of-fit and predictive ability. The primary function, ENMevaluate
, does all the heavy lifting and returns several items including a table of evaluation statistics and, for each setting combination (here, colloquially: runs), a model object and a raster layer showing the model prediction across the study extent. There are also options for calculating niche overlap between predictions, running in parallel to speed up computation, and more. For a more detailed description of the package, check out the open-access publication:
In this vignette, we briefly demonstrate acquisition and pre-processing of input data for ENMeval
. There are a number of other excellent tutorials on these steps, some of which we compiled in the Resources section.
We’ll start by downloading an occurrence dataset for Bradypus variegatus, the Brown-throated sloth. We’ll go ahead and load the ENMeval
and spocc
packages. (We are using spocc
to download occurrence records).
if (!require('spocc')) install.packages('spocc', repos="http://cran.us.r-project.org")
if (!require('ENMeval')) install.packages('ENMeval', repos="http://cran.us.r-project.org")
library(spocc)
library(ENMeval)
# Search GBIF for occurrence data.
bv <- occ('Bradypus variegatus', 'gbif', limit=300, has_coords=TRUE)
# Get the latitude/coordinates for each locality. Also convert the tibble that occ() outputs
# to a data frame for compatibility with ENMeval functions.
occs <- as.data.frame(bv$gbif$data$Bradypus_variegatus[,2:3])
# Remove duplicate rows (Note that you may or may not want to do this).
occs <- occs[!duplicated(occs),]
We are going to model the climatic niche suitability for our focal species using climate data from WorldClim. WorldClim has a range of variables available at various resolutions; for simplicity, here we’ll use the 9 bioclimatic variables at 10 arcmin resolution (about 20 km across at the equator) included in the dismo
package. These climatic data are based on 50-year averages from 1950-2000. Now’s also a good time to load the package, as it includes all the downstream dependencies (raster
, dismo
, etc.).
library(raster)
# First, load some predictor rasters from the dismo folder:
files <- list.files(path=paste(system.file(package='dismo'), '/ex', sep=''), pattern='grd', full.names=TRUE)
# Put the rasters into a RasterStack:
envs <- stack(files)
# Plot first raster in the stack, bio1
plot(envs[[1]], main=names(envs)[1])
# Add points for all the occurrence points onto the raster
points(occs)
# There are some points all the way to the south-east, far from all others. Let's say we know that this represents a subpopulation that we don't want to include, and want to remove these points from the analysis. We can find them by first sorting the occs table by latitude.
head(occs[order(occs$latitude),])
#> longitude latitude
#> 208 -43.40630 -23.00829
#> 228 -42.99930 -22.49430
#> 230 -43.00060 -22.49420
#> 226 -43.00000 -22.49400
#> 225 -42.73640 -22.40630
#> 295 -40.06526 -19.39466
# We see there are two such points, and we can find them by specifying a logical statement that says to find all records with latitude less than -20.
index <- which(occs$latitude < (-20))
# Next, let's subset our dataset to remove them by using the negative assignment on the index vector.
occs <- occs[-index,]
# Let's plot our new points over the old ones to see what a good job we did.
points(occs, col='red')
Next, we will specify the background extent by cropping (or “clipping” in ArcGIS terms) our global predictor variable rasters to a smaller region. Since our models will compare the environment at occurrence (or, presence) localities to the environment at background localities, we need to sample random points from a background extent. To help ensure we don’t include areas that are suitable for our species but are unoccupied due to limitations like dispersal constraints, we will conservatively define the background extent as an area surrounding our occurrence localities. We will do this by buffering a bounding box that includes all occurrence localities. Some other methods of background extent delineation (e.g., minimum convex hulls) are more conservative because they better characterize the geographic space holding the points. In any case, this is one of the many things that you will need to carefully consider for your own study.
library(sp)
# Make a SpatialPoints object
occs.sp <- SpatialPoints(occs)
# Get the bounding box of the points
bb <- bbox(occs.sp)
# Add 5 degrees to each bound by stretching each bound by 10, as the resolution is 0.5 degree.
bb.buf <- extent(bb[1]-10, bb[3]+10, bb[2]-10, bb[4]+10)
# Crop environmental layers to match the study extent
envs.backg <- crop(envs, bb.buf)
We may also, however, want to remove the Caribbean islands (for example) from our background extent. For this, we can use tools from the maptools
package, which is not automatically loaded with ENMeval
.
if (!require('maptools')) install.packages('maptools', repos="http://cran.us.r-project.org")
if (!require('rgeos')) install.packages('rgeos', repos="http://cran.us.r-project.org")
library(maptools)
library(rgeos)
# Get a simple world countries polygon
data(wrld_simpl)
# Get polygons for Central and South America
ca.sa <- wrld_simpl[wrld_simpl@data$SUBREGION==5 | wrld_simpl@data$SUBREGION==13,]
# Both spatial objects have the same geographic coordinate system with slightly different specifications, so just name the coordinate reference system (crs) for ca.sa with that of
# envs.backg to ensure smooth geoprocessing.
crs(envs.backg) <- crs(ca.sa)
# Mask envs by this polygon after buffering a bit to make sure not to lose coastline.
ca.sa <- gBuffer(ca.sa, width=1)
envs.backg <- mask(envs.backg, ca.sa)
# Let's check our work. We should see Central and South America without the Carribbean.
plot(envs.backg[[1]], main=names(envs.backg)[1])
points(occs)
In the next step, we’ll sample 10,000 random points from the background (note that the number of background points is also a consideration you should make with respect to your own study).
library(dismo)
# Randomly sample 10,000 background points from one background extent raster (only one per cell without replacement). Note: Since the raster has <10,000 pixels, you'll get a warning and all pixels will be used for background. We will be sampling from the biome variable because it is missing some grid cells, and we are trying to avoid getting background points with NA.
bg <- randomPoints(envs.backg[[9]], n=10000)
bg <- as.data.frame(bg)
# Notice how we have pretty good coverage (every cell).
plot(envs.backg[[1]], legend=FALSE)
points(bg, col='red')
A run of ENMevaluate begins by using one of six methods to partition occurrence localities into testing and training bins (folds) for k-fold cross-validation (Fielding and Bell 1997; Peterson et al. 2011). Generally, the data partitioning step is done within the main ‘ENMevaluate’ function call. In this section, we illustrate the different options.
The first three partitioning methods are variations of what Radosavljevic and Anderson (2014) referred to as ‘masked geographically structured’ data partitioning. Basically, these methods partition both occurrence records and background points into evaluation bins based on some spatial rules. The intention is to reduce spatial-autocorrelation between points that are included in the testing and training bins, which can overinflate model performance, at least for data sets that result from biased sampling (Veloz 2009; Hijmans 2012; Wenger and Olden 2012).
First, the ‘block’ method partitions data according to the latitude and longitude lines that divide the occurrence localities into four bins of (insofar as possible) equal numbers. Both occurrence and background localities are assigned to each of the four bins based on their position with respect to these lines. The resulting object is a list of two vectors that supply the bin designation for each occurrence and background point.
blocks <- get.block(occs, bg)
str(blocks)
#> List of 2
#> $ occ.grp: num [1:264] 3 3 2 1 2 1 1 4 1 3 ...
#> $ bg.grp : num [1:5600] 2 2 2 2 2 2 2 2 2 2 ...
plot(envs.backg[[1]], col='gray', legend=FALSE)
points(occs, pch=21, bg=blocks$occ.grp)
The next two partitioning methods are variants of a ‘checkerboard’ approach to partition occurrence localities. These generate checkerboard grids across the study extent and partition the localities into bins based on where they fall in the checkerboard. In contrast to the block method, both checkerboard methods subdivide geographic space equally but do not ensure a balanced number of occurrence localities in each bin. For these methods, the user needs to provide a raster layer on which to base the underlying checkerboard pattern. Here we simply use the predictor variable RasterStack. Additionally, the user needs to define an aggregation.factor. This value tells the number of grids cells to aggregate when making the underlying checkerboard pattern.
The Checkerboard1 method partitions the points into k=2 bins using a simple checkerboard pattern.
check1 <- get.checkerboard1(occs, envs, bg, aggregation.factor=5)
plot(envs.backg[[1]], col='gray', legend=FALSE)
points(occs, pch=21, bg=check1$occ.grp)
# The partitioning method is more clearly illustrated by looking at the background points:
points(bg, pch=21, bg=check1$bg.grp)
# We can change the aggregation factor to better illustrate how this partitioning method works:
check1.large <- get.checkerboard1(occs, envs, bg, aggregation.factor=30)
plot(envs.backg[[1]], col='gray', legend=FALSE)
points(bg, pch=21, bg=check1.large$bg.grp)
points(occs, pch=21, bg=check1.large$occ.grp, col='white', cex=1.5)
The Checkerboard2 method partitions the data into k=4 bins. This is done by aggregating the input raster at two scales. Presence and background points are assigned to a bin with respect to where they fall in checkerboards of both scales.
check2 <- get.checkerboard2(occs, envs, bg, aggregation.factor=c(5,5))
plot(envs.backg[[1]], col='gray', legend=FALSE)
points(bg, pch=21, bg=check2$bg.grp)
points(occs, pch=21, bg=check2$occ.grp, col='white', cex=1.5)
The next two methods differ from the first three in that (i) they do not partition the background points into different groups, and (ii) they do not account for spatial autocorrelation between testing and training localities. Primarily when working with relatively small data sets (e.g. < ca. 25 presence localities), users may choose a special case of k-fold cross-validation where the number of bins (k) is equal to the number of occurrence localities (n) in the data set (Pearson et al. 2007; Shcheglovitova and Anderson 2013). This is referred to as the k-1 jackknife. This method will take prohibitively long times for computation when the number of presence localities is medium to large.
jack <- get.jackknife(occs, bg)
plot(envs.backg[[1]], col='gray', legend=FALSE)
points(occs, pch=21, bg=jack$occ.grp) # note that colors are repeated here
The ‘random k-fold’ method partitions occurrence localities randomly into a user specified number of (k) bins. This method is equivalent to the ‘cross-validate’ partitioning scheme available in the current version of the Maxent software GUI.
For instance, let’s partition the data into five evaluation bins:
random <- get.randomkfold(occs, bg, k=5)
plot(envs.backg[[1]], col='gray', legend=FALSE)
points(occs, pch=21, bg=random$occ.grp)
For maximum flexibility, the last partitioning method is designed so that users can define a priori partitions. This provides a flexible way to conduct spatially-independent cross-validation with background masking. For example, perhaps we would like to partition points based on a k-means clustering routine.
ngrps <- 10
kmeans <- kmeans(occs, ngrps)
occ.grp <- kmeans$cluster
plot(envs.backg[[1]], col='gray', legend=FALSE)
points(occs, pch=21, bg=occ.grp)
When using the user-defined partitioning method, we need to supply ENMevaluate with group identifiers for both occurrence points AND background points. If we want to use all background points for each group, we can set the background to zero.
bg.grp <- rep(0, nrow(bg))
plot(envs.backg[[1]], col='gray', legend=FALSE)
points(bg, pch=16, bg=bg.grp)
Alternatively, we may think of various ways to partition background data. This depends on the goals of the study but we might, for example, find it reasonable to partition background by clustering around the centroids of the occurrence clusters.
centers <- kmeans$center
d <- pointDistance(bg, centers, lonlat=T)
bg.grp <- apply(d, 1, function(x) which(x==min(x)))
plot(envs.backg[[1]], col='gray', legend=FALSE)
points(bg, pch=21, bg=bg.grp)
Choosing among these data partitioning methods depends on the research objectives and the characteristics of the study system. Refer to the Resources section for additional considerations on appropriate partitioning for evaluation.
Once you decide which method of data partitioning you would like to use, you are ready to start building models. We now move on to the main function in ENMeval: ENMevaluate
.
The two main parameters to define when calling ENMevaluate
are (1) the range of regularization multiplier values and (2) the combinations of feature class to consider. The regularization multiplier (RM) determines the penalty for adding parameters to the model. Higher RM values impose a stronger penalty on model complexity and thus result in simpler (flatter) model predictions. The feature classes determine the potential shape of the response curves. A model that is only allowed to include linear feature classes will most likely be simpler than a model that is allowed to include all possible feature classes. Much more description of these parameters is available in the Resources section. For the purposes of this vignette, we demonstrate simply how to adjust these parameters. The following section deals with comparing the outputs of each model.
Unless you supply the function with background points (which is recommended in many cases), you will need to define how many background points should be used with the ‘n.bg’ argument. If any of your predictor variables are categorical (e.g., biomes), you will need to define which layer(s) these are using the ‘categoricals’ argument.
ENMevaluate builds a separate model for each unique combination of RM values and feature class combinations. For example, the following call will build and evaluate 2 models. One with RM=1 and another with RM=2, both allowing only linear features.
We may, however, want to compare a wider range of models that can use a wider variety of feature classes:
When building many models, the command may take a long time to run. Of course this depends on the size of your dataset and the computer you are using. When working on big projects, running the command in parallel can be faster.
Another way to save time at this stage is to turn off the option that generates model predictions across the full study extent (rasterPreds). Note, however, that the full model predictions are needed for calculating AICc values so those are returned as NA in the results table when the rasterPreds
argument is set to FALSE.
We can also calculate one of two niche overlap statistics while running ENMevaluate
by setting the niche.overlap
argument, which supports Moran’s I or Schoener’s D. Note that you can also calculate this value at a later stage using the separate calc.niche.overlap
function.
overlap
#> L_1 LQ_1 LQP_1 L_2 LQ_2 LQP_2
#> L_1 NA NA NA NA NA NA
#> LQ_1 0.7867513 NA NA NA NA NA
#> LQP_1 0.7180430 0.8345944 NA NA NA NA
#> L_2 0.9991292 0.7866152 0.7178221 NA NA NA
#> LQ_2 0.7896247 0.9857127 0.8354486 0.7894947 NA NA
#> LQP_2 0.7185629 0.8367470 0.9609317 0.7183484 0.8387686 NA
The bin.output
argument determines if separate evaluation statistics for each testing bin are included in the results file. If bin.output=FALSE
, only the mean and variance of evaluation statistics across k bins is returned.
Now let’s take a look at the ENMeval object in more detail. It contains the following slots:
overlap=T
) A matrix of the pairwise niche overlap metricLet’s first examine the structure of the object:
eval2
#> An object of class: ENMevaluation
#> Occurrence/background points: 261/5600
#> Partition method: checkerboard222
#> Feature classes: L, LQ, LQP
#> Regularization multipliers: 1, 2
#> @algorithm : character of algorithm used
#> @results : data.frame of evaluation results
#> @predictions : RasterStack of model predictions
#> @models : list of model objects
#> @partition.method : character of partition method used
#> @occ.pts : data.frame of occurrence coordinates
#> @occ.grp : vector of bins for occurrence points
#> @bg.pts : data.frame of background coordinates
#> @bg.grp : vector of bins for background points
#>
str(eval2, max.level=3)
#> Formal class 'ENMevaluation' [package "ENMeval"] with 10 slots
#> ..@ algorithm : chr "Maxent 3.3.3k via dismo 1.1.4"
#> ..@ results :'data.frame': 6 obs. of 16 variables:
#> .. ..$ settings : Factor w/ 6 levels "L_1","L_2","LQ_1",..: 1 3 5 2 4 6
#> .. ..$ features : Factor w/ 3 levels "L","LQ","LQP": 1 2 3 1 2 3
#> .. ..$ rm : num [1:6] 1 1 1 2 2 2
#> .. ..$ full.AUC : num [1:6] 0.868 0.891 0.899 0.868 0.892 ...
#> .. ..$ Mean.AUC : num [1:6] 0.792 0.813 0.832 0.792 0.812 ...
#> .. ..$ Var.AUC : num [1:6] 0.0401 0.0276 0.0143 0.0401 0.0273 ...
#> .. ..$ Mean.AUC.DIFF: num [1:6] 0.0943 0.0909 0.0739 0.0944 0.0917 ...
#> .. ..$ Var.AUC.DIFF : num [1:6] 0.0503 0.0335 0.0181 0.0503 0.0332 ...
#> .. ..$ Mean.OR10 : num [1:6] 0.276 0.244 0.232 0.276 0.24 ...
#> .. ..$ Var.OR10 : num [1:6] 0.0284 0.038 0.0448 0.0284 0.0391 ...
#> .. ..$ Mean.ORmin : num [1:6] 0 0.0651 0.013 0 0.0651 ...
#> .. ..$ Var.ORmin : num [1:6] 0 0.016954 0.000678 0 0.016954 ...
#> .. ..$ AICc : num [1:6] 3997 3876 3798 3995 3870 ...
#> .. ..$ delta.AICc : num [1:6] 204.46 83.27 5.24 202.43 77.98 ...
#> .. ..$ w.AIC : num [1:6] 3.73e-45 7.73e-19 6.77e-02 1.03e-44 1.09e-17 ...
#> .. ..$ nparam : num [1:6] 7 14 32 6 13 27
#> ..@ predictions :Formal class 'RasterStack' [package "raster"] with 11 slots
#> ..@ models :List of 6
#> .. ..$ :Formal class 'MaxEnt' [package "dismo"] with 7 slots
#> .. ..$ :Formal class 'MaxEnt' [package "dismo"] with 7 slots
#> .. ..$ :Formal class 'MaxEnt' [package "dismo"] with 7 slots
#> .. ..$ :Formal class 'MaxEnt' [package "dismo"] with 7 slots
#> .. ..$ :Formal class 'MaxEnt' [package "dismo"] with 7 slots
#> .. ..$ :Formal class 'MaxEnt' [package "dismo"] with 7 slots
#> ..@ partition.method: Named chr [1:3] "checkerboard2" "2" "2"
#> .. ..- attr(*, "names")= chr [1:3] "method" "aggregation.factor1" "aggregation.factor2"
#> ..@ occ.pts :'data.frame': 261 obs. of 2 variables:
#> .. ..$ LON: num [1:261] -58.9 -79.7 -63.1 -84.6 -85.1 ...
#> .. ..$ LAT: num [1:261] 4.24 9.08 -17.77 9.79 10.72 ...
#> ..@ occ.grp : num [1:261] 3 4 1 2 2 3 1 1 4 4 ...
#> ..@ bg.pts :'data.frame': 5600 obs. of 2 variables:
#> .. ..$ LON: num [1:5600] -56.8 -59.2 -50.8 -64.8 -79.2 ...
#> .. ..$ LAT: num [1:5600] -22.75 5.25 -26.75 -13.25 -6.25 ...
#> ..@ bg.grp : num [1:5600] 2 4 1 3 4 2 2 3 2 1 ...
#> ..@ overlap : num[0 , 0 ]
The first slot tells which algorithm was used (maxent.jar or maxnet) and which version of the software.
The next slot holds the table of evaluation metrics. We can use this to, for example, select the ‘best’ model based on one or more of our evaluation criteria. Let’s find the model settings that resulted in the lowest AICc.
eval2@results
#> settings features rm full.AUC Mean.AUC Var.AUC Mean.AUC.DIFF
#> 1 L_1 L 1 0.8681 0.7918599 0.04006806 0.09433686
#> 2 LQ_1 LQ 1 0.8912 0.8127751 0.02755950 0.09089115
#> 3 LQP_1 LQP 1 0.8995 0.8316293 0.01432006 0.07393175
#> 4 L_2 L 2 0.8681 0.7918220 0.04006170 0.09435098
#> 5 LQ_2 LQ 2 0.8918 0.8116690 0.02728863 0.09168146
#> 6 LQP_2 LQP 2 0.8976 0.8252418 0.01532929 0.07752914
#> Var.AUC.DIFF Mean.OR10 Var.OR10 Mean.ORmin Var.ORmin AICc
#> 1 0.05033043 0.2762004 0.02842037 0.00000000 0.0000000000 3996.929
#> 2 0.03350653 0.2442442 0.03796164 0.06510417 0.0169542101 3875.739
#> 3 0.01808907 0.2315323 0.04478080 0.01302083 0.0006781684 3797.716
#> 4 0.05034134 0.2762004 0.02842037 0.00000000 0.0000000000 3994.900
#> 5 0.03316246 0.2400069 0.03906964 0.06510417 0.0169542101 3870.452
#> 6 0.01974627 0.2538006 0.03572751 0.01302083 0.0006781684 3792.472
#> delta.AICc w.AIC nparam
#> 1 204.457703 3.733541e-45 7
#> 2 83.267416 7.731389e-19 14
#> 3 5.244554 6.771838e-02 32
#> 4 202.428025 1.030054e-44 6
#> 5 77.979958 1.087463e-17 13
#> 6 0.000000 9.322816e-01 27
eval2@results[which(eval2@results$delta.AICc==0),]
#> settings features rm full.AUC Mean.AUC Var.AUC Mean.AUC.DIFF
#> 6 LQP_2 LQP 2 0.8976 0.8252418 0.01532929 0.07752914
#> Var.AUC.DIFF Mean.OR10 Var.OR10 Mean.ORmin Var.ORmin AICc
#> 6 0.01974627 0.2538006 0.03572751 0.01302083 0.0006781684 3792.472
#> delta.AICc w.AIC nparam
#> 6 0 0.9322816 27
Now let’s access the RasterStack of the model predictions. Note that these predictions are in the ‘raw’ output format.
eval2@predictions
#> class : RasterStack
#> dimensions : 192, 186, 35712, 6 (nrow, ncol, ncell, nlayers)
#> resolution : 0.5, 0.5 (x, y)
#> extent : -125, -32, -56, 40 (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
#> names : L_1, LQ_1, LQP_1, L_2, LQ_2, LQP_2
#> min values : 4.416935e-08, 2.014694e-18, 2.082106e-19, 4.478873e-08, 2.552360e-18, 3.260426e-18
#> max values : 0.01224537, 0.01745164, 0.01414898, 0.01223516, 0.01840422, 0.01585029
Now plot the model with the lowest AICc:
If we used the ‘maxent.jar’ algorithm, we can also access a list of Maxent model objects, which (as all lists) can be subset with double brackets (e.g. results@eval2[[1]]
). The Maxent model objects provide access to various elements of the model (including the lambda file). The model objects can also be used for predicting models into other time periods or geographic areas. Note that the html file that is created when Maxent is run is not kept.
(Stay tuned for an update on the vignette focusing on the output from the ‘maxnet’ algorithm)
Let’s look at the model object for our “AICc optimal” model:
aic.opt <- eval2@models[[which(eval2@results$delta.AICc==0)]]
aic.opt
#> class : MaxEnt
#> variables: bio1 bio12 bio16 bio17 bio5 bio6 bio7 bio8 biome
#> output html file no longer exists
The “results” slot shows the Maxent model statistics:
aic.opt@results
#> [,1]
#> X.Training.samples 261.0000
#> Regularized.training.gain 1.3466
#> Unregularized.training.gain 1.5109
#> Iterations 500.0000
#> Training.AUC 0.8976
#> X.Background.points 5600.0000
#> bio1.contribution 5.8445
#> bio12.contribution 8.7935
#> bio16.contribution 13.6053
#> bio17.contribution 9.3897
#> bio5.contribution 8.3884
#> bio6.contribution 10.9994
#> bio7.contribution 31.4864
#> bio8.contribution 1.2791
#> biome.contribution 10.2137
#> bio1.permutation.importance 13.1881
#> bio12.permutation.importance 35.7216
#> bio16.permutation.importance 6.0448
#> bio17.permutation.importance 6.2319
#> bio5.permutation.importance 1.3011
#> bio6.permutation.importance 7.6112
#> bio7.permutation.importance 20.7751
#> bio8.permutation.importance 2.3983
#> biome.permutation.importance 6.7278
#> Entropy 7.3034
#> Prevalence..average.of.logistic.output.over.background.sites. 0.1331
#> Fixed.cumulative.value.1.cumulative.threshold 1.0000
#> Fixed.cumulative.value.1.logistic.threshold 0.0295
#> Fixed.cumulative.value.1.area 0.5893
#> Fixed.cumulative.value.1.training.omission 0.0307
#> Fixed.cumulative.value.5.cumulative.threshold 5.0000
#> Fixed.cumulative.value.5.logistic.threshold 0.0994
#> Fixed.cumulative.value.5.area 0.4187
#> Fixed.cumulative.value.5.training.omission 0.0651
#> Fixed.cumulative.value.10.cumulative.threshold 10.0000
#> Fixed.cumulative.value.10.logistic.threshold 0.1530
#> Fixed.cumulative.value.10.area 0.3289
#> Fixed.cumulative.value.10.training.omission 0.0766
#> Minimum.training.presence.cumulative.threshold 0.0570
#> Minimum.training.presence.logistic.threshold 0.0036
#> Minimum.training.presence.area 0.7680
#> Minimum.training.presence.training.omission 0.0000
#> X10.percentile.training.presence.cumulative.threshold 18.9935
#> X10.percentile.training.presence.logistic.threshold 0.2227
#> X10.percentile.training.presence.area 0.2241
#> X10.percentile.training.presence.training.omission 0.0996
#> Equal.training.sensitivity.and.specificity.cumulative.threshold 23.3502
#> Equal.training.sensitivity.and.specificity.logistic.threshold 0.2615
#> Equal.training.sensitivity.and.specificity.area 0.1877
#> Equal.training.sensitivity.and.specificity.training.omission 0.1877
#> Maximum.training.sensitivity.plus.specificity.cumulative.threshold 19.5007
#> Maximum.training.sensitivity.plus.specificity.logistic.threshold 0.2264
#> Maximum.training.sensitivity.plus.specificity.area 0.2195
#> Maximum.training.sensitivity.plus.specificity.training.omission 0.1034
#> Balance.training.omission..predicted.area.and.threshold.value.cumulative.threshold 3.7561
#> Balance.training.omission..predicted.area.and.threshold.value.logistic.threshold 0.0782
#> Balance.training.omission..predicted.area.and.threshold.value.area 0.4527
#> Balance.training.omission..predicted.area.and.threshold.value.training.omission 0.0383
#> Equate.entropy.of.thresholded.and.original.distributions.cumulative.threshold 14.9767
#> Equate.entropy.of.thresholded.and.original.distributions.logistic.threshold 0.1905
#> Equate.entropy.of.thresholded.and.original.distributions.area 0.2652
#> Equate.entropy.of.thresholded.and.original.distributions.training.omission 0.0958
You can use the var.importance
function to get a data.frame of two variable importance metrics: percent contribution and permutation importance. See the function help file for more information on these metrics.
var.importance(aic.opt)
#> variable percent.contribution permutation.importance
#> 1 bio1 5.8445 13.1881
#> 2 bio12 8.7935 35.7216
#> 3 bio16 13.6053 6.0448
#> 4 bio17 9.3897 6.2319
#> 5 bio5 8.3884 1.3011
#> 6 bio6 10.9994 7.6112
#> 7 bio7 31.4864 20.7751
#> 8 bio8 1.2791 2.3983
#> 9 biome 10.2137 6.7278
The “lambdas” slot shows which variables were included in the model. After the variable name, the next number is the variable coefficient, then the min and max of that variable for the inut data. If the coefficient is 0, that variable was not included in the model. You will likely find the syntax to be cryptic and the information is not stored in a very user-friendly way. Fortunately, John Baumgartner has developed a useful function to parse this file into a more user-friendly data.frame. See the parse_lambdas.R
function in his rmaxent
package for more details.
aic.opt@lambdas
#> [1] "bio1, 15.66661717129279, -1.0, 289.0"
#> [2] "bio12, 0.0, 0.0, 7682.0"
#> [3] "bio16, -6.411128107833744, 0.0, 2458.0"
#> [4] "bio17, 0.0, 0.0, 1496.0"
#> [5] "bio5, 0.0, 97.0, 364.0"
#> [6] "bio6, 0.4315893299759933, -128.0, 240.0"
#> [7] "bio7, -24.46825761752212, 62.0, 330.0"
#> [8] "bio8, 0.0, -25.0, 290.0"
#> [9] "biome, -3.6830923841996883, 1.0, 14.0"
#> [10] "bio16^2, 1.4299111936955575, 0.0, 6041764.0"
#> [11] "bio17^2, 0.6126479011546977, 0.0, 2238016.0"
#> [12] "bio5^2, -2.5057534913913284, 9409.0, 132496.0"
#> [13] "bio6^2, -7.623902054808719, 0.0, 57600.0"
#> [14] "bio7^2, -0.7467190393583492, 3844.0, 108900.0"
#> [15] "bio8^2, -1.9341955556154555, 16.0, 84100.0"
#> [16] "biome^2, 2.2735844264675706, 1.0, 196.0"
#> [17] "bio1*bio12, 7.368911770796855, -117.0, 2028048.0"
#> [18] "bio1*bio16, -1.1888262769365994, -51.0, 641538.0"
#> [19] "bio1*bio7, -1.3886543019097317, -198.0, 65205.0"
#> [20] "bio1*biome, 6.2325764893648525, -10.0, 3794.0"
#> [21] "bio12*bio5, 1.6056636848833303, 0.0, 2396784.0"
#> [22] "bio12*bio7, 18.669329322871263, 0.0, 737464.0"
#> [23] "bio16*bio17, -8.760783487306433, 0.0, 3629296.0"
#> [24] "bio16*bio8, -4.778708055853241, -1150.0, 638038.0"
#> [25] "bio16*biome, -6.0443865992213635, 0.0, 25900.0"
#> [26] "bio17*bio7, -7.320134172296795, 0.0, 145705.0"
#> [27] "bio17*biome, -22.075731094885455, 0.0, 16436.0"
#> [28] "bio5*bio7, -1.0131674372072306, 14580.0, 113505.0"
#> [29] "bio5*bio8, -0.8296516600822442, -2675.0, 104110.0"
#> [30] "bio6*biome, 0.926757275688733, -1280.0, 3234.0"
#> [31] "bio7*biome, -0.9895691775865285, 62.0, 3250.0"
#> [32] "linearPredictorNormalizer, 6.493603525339264"
#> [33] "densityNormalizer, 63.09031541146724"
#> [34] "numBackgroundPoints, 5600"
#> [35] "entropy, 7.303428234120329"
Finally, the ENMevaluate object also remembers which occurrence partitioning method you used:
Plotting options in R are extremely flexible and here we demonstrate some key tools to explore the results of an ENMevaluate object graphically.
ENMeval has a built-in plotting function (eval.plot
) to visualize the results of different models. It requires the results table of the ENMevaluation object. By default, it plots delta.AICc values.
You can choose which evaluation metric to plot, and you can also include error bars, if relevant.
You can also plot the permutation importance or percent contribution.
df <- var.importance(aic.opt)
barplot(df$permutation.importance, names.arg=df$variable, las=2, ylab="Permutation Importance")
If you generated raster predictions of the models (i.e., rasterpreds=T
), you can easily plot them. For example, let’s look at the first two models included in our analysis - remember that the output values are in Maxent’s ‘raw’ units.
plot(eval2@predictions[[1]], legend=F)
# Now add the occurrence and background points, colored by evaluation bins:
points(eval2@bg.pts, pch=3, col=eval2@bg.grp, cex=0.5)
points(eval2@occ.pts, pch=21, bg=eval2@occ.grp)
Let’s see how model complexity changes the predictions in our example. We’ll compare the model predictions of the model with only linear feature classes and with the highest regularization multiplier value we used (i.e., fc=‘L’, RM=2) versus the model with all feature class combination and the lowest regularization multiplier value we used (i.e., fc=‘LQP’, RM=1).
# bisect the plotting area to make two columns
par(mfrow=c(1,2), mar=c(2,2,1,0))
plot(eval2@predictions[['L_2']], ylim=c(-30,20), xlim=c(-90,-40), legend=F, main='L_2 prediction')
plot(eval2@predictions[['LQP_1']], ylim=c(-30,20), xlim=c(-90,-40), legend=F, main='LQP_1 prediction')
We can also plot the response curves of our model to see how different input variables influence our model predictions. (Note that, as with the dismo::maxent
function, using this function requires that the maxent.jar file be installed in the dismo
package java folder).
Below is a running list of other things we plan to add to this vignette. Feel free to let us know if there are particular things you would like to see added.
mess()
in the dismo
package)