ENMeval Vignette

Partitioning Occurrences for Evaluation

A run of ENMevaluate begins by using one of six methods to partition occurrence localities into testing and training bins (folds) for k-fold cross-validation (Fielding and Bell 1997; Peterson et al. 2011). Generally, the data partitioning step is done within the main ‘ENMevaluate’ function call. In this section, we illustrate the different options.

Block
Checkerboard1
Checkerboard2
k-1 Jackknife
Random k-fold
User-defined

The first three partitioning methods are variations of what Radosavljevic and Anderson (2014) referred to as ‘masked geographically structured’ data partitioning. Basically, these methods partition both occurrence records and background points into evaluation bins based on some spatial rules. The intention is to reduce spatial-autocorrelation between points that are included in the testing and training bins, which can overinflate model performance, at least for data sets that result from biased sampling (Veloz 2009; Hijmans 2012; Wenger and Olden 2012).

1. Block

First, the ‘block’ method partitions data according to the latitude and longitude lines that divide the occurrence localities into four bins of (insofar as possible) equal numbers. Both occurrence and background localities are assigned to each of the four bins based on their position with respect to these lines. The resulting object is a list of two vectors that supply the bin designation for each occurrence and background point.

blocks <- get.block(occs, bg)
str(blocks)
#> List of 2
#>  $ occ.grp: num [1:264] 3 3 2 1 2 1 1 4 1 3 ...
#>  $ bg.grp : num [1:5600] 2 2 2 2 2 2 2 2 2 2 ...

plot(envs.backg[[1]], col='gray', legend=FALSE)
points(occs, pch=21, bg=blocks$occ.grp)

2. Checkerboard1

The next two partitioning methods are variants of a ‘checkerboard’ approach to partition occurrence localities. These generate checkerboard grids across the study extent and partition the localities into bins based on where they fall in the checkerboard. In contrast to the block method, both checkerboard methods subdivide geographic space equally but do not ensure a balanced number of occurrence localities in each bin. For these methods, the user needs to provide a raster layer on which to base the underlying checkerboard pattern. Here we simply use the predictor variable RasterStack. Additionally, the user needs to define an aggregation.factor. This value tells the number of grids cells to aggregate when making the underlying checkerboard pattern.

The Checkerboard1 method partitions the points into k=2 bins using a simple checkerboard pattern.

check1 <- get.checkerboard1(occs, envs, bg, aggregation.factor=5)

plot(envs.backg[[1]], col='gray', legend=FALSE)
points(occs, pch=21, bg=check1$occ.grp)

# The partitioning method is more clearly illustrated by looking at the background points:
points(bg, pch=21, bg=check1$bg.grp)


# We can change the aggregation factor to better illustrate how this partitioning method works:
check1.large <- get.checkerboard1(occs, envs, bg, aggregation.factor=30)
plot(envs.backg[[1]], col='gray', legend=FALSE)
points(bg, pch=21, bg=check1.large$bg.grp)
points(occs, pch=21, bg=check1.large$occ.grp, col='white', cex=1.5)

3. Checkerboard2

The Checkerboard2 method partitions the data into k=4 bins. This is done by aggregating the input raster at two scales. Presence and background points are assigned to a bin with respect to where they fall in checkerboards of both scales.

check2 <- get.checkerboard2(occs, envs, bg, aggregation.factor=c(5,5))

plot(envs.backg[[1]], col='gray', legend=FALSE)
points(bg, pch=21, bg=check2$bg.grp)
points(occs, pch=21, bg=check2$occ.grp, col='white', cex=1.5)

4. k-1 Jackknife

The next two methods differ from the first three in that (i) they do not partition the background points into different groups, and (ii) they do not account for spatial autocorrelation between testing and training localities. Primarily when working with relatively small data sets (e.g. < ca. 25 presence localities), users may choose a special case of k-fold cross-validation where the number of bins (k) is equal to the number of occurrence localities (n) in the data set (Pearson et al. 2007; Shcheglovitova and Anderson 2013). This is referred to as the k-1 jackknife. This method will take prohibitively long times for computation when the number of presence localities is medium to large.

jack <- get.jackknife(occs, bg)

plot(envs.backg[[1]], col='gray', legend=FALSE)
points(occs, pch=21, bg=jack$occ.grp)  # note that colors are repeated here

5. Random k-fold

The ‘random k-fold’ method partitions occurrence localities randomly into a user specified number of (k) bins. This method is equivalent to the ‘cross-validate’ partitioning scheme available in the current version of the Maxent software GUI.

For instance, let’s partition the data into five evaluation bins:

random <- get.randomkfold(occs, bg, k=5)

plot(envs.backg[[1]], col='gray', legend=FALSE)
points(occs, pch=21, bg=random$occ.grp)

6. User-defined

For maximum flexibility, the last partitioning method is designed so that users can define a priori partitions. This provides a flexible way to conduct spatially-independent cross-validation with background masking. For example, perhaps we would like to partition points based on a k-means clustering routine.

ngrps <- 10
kmeans <- kmeans(occs, ngrps)
occ.grp <- kmeans$cluster

plot(envs.backg[[1]], col='gray', legend=FALSE)
points(occs, pch=21, bg=occ.grp)

When using the user-defined partitioning method, we need to supply ENMevaluate with group identifiers for both occurrence points AND background points. If we want to use all background points for each group, we can set the background to zero.

bg.grp <- rep(0, nrow(bg))

plot(envs.backg[[1]], col='gray', legend=FALSE)
points(bg, pch=16, bg=bg.grp)

Alternatively, we may think of various ways to partition background data. This depends on the goals of the study but we might, for example, find it reasonable to partition background by clustering around the centroids of the occurrence clusters.

centers <- kmeans$center
d <- pointDistance(bg, centers, lonlat=T)
bg.grp <- apply(d, 1, function(x) which(x==min(x)))

plot(envs.backg[[1]], col='gray', legend=FALSE)
points(bg, pch=21, bg=bg.grp)

Choosing among these data partitioning methods depends on the research objectives and the characteristics of the study system. Refer to the Resources section for additional considerations on appropriate partitioning for evaluation.

Running ENMeval

Once you decide which method of data partitioning you would like to use, you are ready to start building models. We now move on to the main function in ENMeval: ENMevaluate.

Initial considerations
Exploring the results (the ENMevaluate object)

Initial considerations

The two main parameters to define when calling ENMevaluate are (1) the range of regularization multiplier values and (2) the combinations of feature class to consider. The regularization multiplier (RM) determines the penalty for adding parameters to the model. Higher RM values impose a stronger penalty on model complexity and thus result in simpler (flatter) model predictions. The feature classes determine the potential shape of the response curves. A model that is only allowed to include linear feature classes will most likely be simpler than a model that is allowed to include all possible feature classes. Much more description of these parameters is available in the Resources section. For the purposes of this vignette, we demonstrate simply how to adjust these parameters. The following section deals with comparing the outputs of each model.

Unless you supply the function with background points (which is recommended in many cases), you will need to define how many background points should be used with the ‘n.bg’ argument. If any of your predictor variables are categorical (e.g., biomes), you will need to define which layer(s) these are using the ‘categoricals’ argument.

ENMevaluate builds a separate model for each unique combination of RM values and feature class combinations. For example, the following call will build and evaluate 2 models. One with RM=1 and another with RM=2, both allowing only linear features.

eval1 <- ENMevaluate(occs, envs, bg, method='checkerboard2', RMvalues=c(1,2), fc=c('L'), algorithm='maxent.jar')

We may, however, want to compare a wider range of models that can use a wider variety of feature classes:

eval2 <- ENMevaluate(occ=occs, env=envs, bg.coords=bg, method='checkerboard2', RMvalues=c(1,2), fc=c('L','LQ','LQP'), algorithm='maxent.jar')

When building many models, the command may take a long time to run. Of course this depends on the size of your dataset and the computer you are using. When working on big projects, running the command in parallel can be faster.

eval2.par <- ENMevaluate(occs, envs, bg, method='checkerboard2', RMvalues=c(1,2), fc=c('L','LQ','LQP'), parallel=TRUE, algorithm='maxent.jar')

Another way to save time at this stage is to turn off the option that generates model predictions across the full study extent (rasterPreds). Note, however, that the full model predictions are needed for calculating AICc values so those are returned as NA in the results table when the rasterPreds argument is set to FALSE.

eval3 <- ENMevaluate(occs, envs, bg, method='checkerboard2', RMvalues=c(1,2), fc=c('L','LQ','LQP'), rasterPreds=FALSE, algorithm='maxent.jar')

We can also calculate one of two niche overlap statistics while running ENMevaluate by setting the niche.overlap argument, which supports Moran’s I or Schoener’s D. Note that you can also calculate this value at a later stage using the separate calc.niche.overlap function.

overlap <- calc.niche.overlap(eval2@predictions, stat='D')

overlap
#>             L_1      LQ_1     LQP_1       L_2      LQ_2 LQP_2
#> L_1          NA        NA        NA        NA        NA    NA
#> LQ_1  0.7867513        NA        NA        NA        NA    NA
#> LQP_1 0.7180430 0.8345944        NA        NA        NA    NA
#> L_2   0.9991292 0.7866152 0.7178221        NA        NA    NA
#> LQ_2  0.7896247 0.9857127 0.8354486 0.7894947        NA    NA
#> LQP_2 0.7185629 0.8367470 0.9609317 0.7183484 0.8387686    NA

The bin.output argument determines if separate evaluation statistics for each testing bin are included in the results file. If bin.output=FALSE, only the mean and variance of evaluation statistics across k bins is returned.

Exploring the results

Now let’s take a look at the ENMeval object in more detail. It contains the following slots:

A character vector showing which algorithm was used
A data.frame holding the model evaluation statistics
A RasterStack of the model predictions
A list of maxent model objects
A data.frame of the original occurrence coordinates
A vector of the evaluation bins used for the occurrence points
A data.frame of the background coordinates
A vector of the evaluation bins used for the background points
(if overlap=T) A matrix of the pairwise niche overlap metric

Let’s first examine the structure of the object:

eval2
#> An object of class: ENMevaluation
#>  Occurrence/background points: 261/5600
#>  Partition method: checkerboard222
#>  Feature classes: L, LQ, LQP
#>  Regularization multipliers: 1, 2
#>  @algorithm             : character of algorithm used
#>  @results               : data.frame of evaluation results
#>  @predictions           : RasterStack of model predictions
#>  @models                : list of model objects
#>  @partition.method      : character of partition method used
#>  @occ.pts               : data.frame of occurrence coordinates
#>  @occ.grp               : vector of bins for occurrence points
#>  @bg.pts                : data.frame of background coordinates
#>  @bg.grp                : vector of bins for background points
#> 

str(eval2, max.level=3)
#> Formal class 'ENMevaluation' [package "ENMeval"] with 10 slots
#>   ..@ algorithm       : chr "Maxent 3.3.3k via dismo 1.1.4"
#>   ..@ results         :'data.frame': 6 obs. of  16 variables:
#>   .. ..$ settings     : Factor w/ 6 levels "L_1","L_2","LQ_1",..: 1 3 5 2 4 6
#>   .. ..$ features     : Factor w/ 3 levels "L","LQ","LQP": 1 2 3 1 2 3
#>   .. ..$ rm           : num [1:6] 1 1 1 2 2 2
#>   .. ..$ full.AUC     : num [1:6] 0.868 0.891 0.899 0.868 0.892 ...
#>   .. ..$ Mean.AUC     : num [1:6] 0.792 0.813 0.832 0.792 0.812 ...
#>   .. ..$ Var.AUC      : num [1:6] 0.0401 0.0276 0.0143 0.0401 0.0273 ...
#>   .. ..$ Mean.AUC.DIFF: num [1:6] 0.0943 0.0909 0.0739 0.0944 0.0917 ...
#>   .. ..$ Var.AUC.DIFF : num [1:6] 0.0503 0.0335 0.0181 0.0503 0.0332 ...
#>   .. ..$ Mean.OR10    : num [1:6] 0.276 0.244 0.232 0.276 0.24 ...
#>   .. ..$ Var.OR10     : num [1:6] 0.0284 0.038 0.0448 0.0284 0.0391 ...
#>   .. ..$ Mean.ORmin   : num [1:6] 0 0.0651 0.013 0 0.0651 ...
#>   .. ..$ Var.ORmin    : num [1:6] 0 0.016954 0.000678 0 0.016954 ...
#>   .. ..$ AICc         : num [1:6] 3997 3876 3798 3995 3870 ...
#>   .. ..$ delta.AICc   : num [1:6] 204.46 83.27 5.24 202.43 77.98 ...
#>   .. ..$ w.AIC        : num [1:6] 3.73e-45 7.73e-19 6.77e-02 1.03e-44 1.09e-17 ...
#>   .. ..$ nparam       : num [1:6] 7 14 32 6 13 27
#>   ..@ predictions     :Formal class 'RasterStack' [package "raster"] with 11 slots
#>   ..@ models          :List of 6
#>   .. ..$ :Formal class 'MaxEnt' [package "dismo"] with 7 slots
#>   .. ..$ :Formal class 'MaxEnt' [package "dismo"] with 7 slots
#>   .. ..$ :Formal class 'MaxEnt' [package "dismo"] with 7 slots
#>   .. ..$ :Formal class 'MaxEnt' [package "dismo"] with 7 slots
#>   .. ..$ :Formal class 'MaxEnt' [package "dismo"] with 7 slots
#>   .. ..$ :Formal class 'MaxEnt' [package "dismo"] with 7 slots
#>   ..@ partition.method: Named chr [1:3] "checkerboard2" "2" "2"
#>   .. ..- attr(*, "names")= chr [1:3] "method" "aggregation.factor1" "aggregation.factor2"
#>   ..@ occ.pts         :'data.frame': 261 obs. of  2 variables:
#>   .. ..$ LON: num [1:261] -58.9 -79.7 -63.1 -84.6 -85.1 ...
#>   .. ..$ LAT: num [1:261] 4.24 9.08 -17.77 9.79 10.72 ...
#>   ..@ occ.grp         : num [1:261] 3 4 1 2 2 3 1 1 4 4 ...
#>   ..@ bg.pts          :'data.frame': 5600 obs. of  2 variables:
#>   .. ..$ LON: num [1:5600] -56.8 -59.2 -50.8 -64.8 -79.2 ...
#>   .. ..$ LAT: num [1:5600] -22.75 5.25 -26.75 -13.25 -6.25 ...
#>   ..@ bg.grp          : num [1:5600] 2 4 1 3 4 2 2 3 2 1 ...
#>   ..@ overlap         : num[0 , 0 ]

The first slot tells which algorithm was used (maxent.jar or maxnet) and which version of the software.

eval2@algorithm
#> [1] "Maxent 3.3.3k via dismo 1.1.4"

The next slot holds the table of evaluation metrics. We can use this to, for example, select the ‘best’ model based on one or more of our evaluation criteria. Let’s find the model settings that resulted in the lowest AICc.

eval2@results
#>   settings features rm full.AUC  Mean.AUC    Var.AUC Mean.AUC.DIFF
#> 1      L_1        L  1   0.8681 0.7918599 0.04006806    0.09433686
#> 2     LQ_1       LQ  1   0.8912 0.8127751 0.02755950    0.09089115
#> 3    LQP_1      LQP  1   0.8995 0.8316293 0.01432006    0.07393175
#> 4      L_2        L  2   0.8681 0.7918220 0.04006170    0.09435098
#> 5     LQ_2       LQ  2   0.8918 0.8116690 0.02728863    0.09168146
#> 6    LQP_2      LQP  2   0.8976 0.8252418 0.01532929    0.07752914
#>   Var.AUC.DIFF Mean.OR10   Var.OR10 Mean.ORmin    Var.ORmin     AICc
#> 1   0.05033043 0.2762004 0.02842037 0.00000000 0.0000000000 3996.929
#> 2   0.03350653 0.2442442 0.03796164 0.06510417 0.0169542101 3875.739
#> 3   0.01808907 0.2315323 0.04478080 0.01302083 0.0006781684 3797.716
#> 4   0.05034134 0.2762004 0.02842037 0.00000000 0.0000000000 3994.900
#> 5   0.03316246 0.2400069 0.03906964 0.06510417 0.0169542101 3870.452
#> 6   0.01974627 0.2538006 0.03572751 0.01302083 0.0006781684 3792.472
#>   delta.AICc        w.AIC nparam
#> 1 204.457703 3.733541e-45      7
#> 2  83.267416 7.731389e-19     14
#> 3   5.244554 6.771838e-02     32
#> 4 202.428025 1.030054e-44      6
#> 5  77.979958 1.087463e-17     13
#> 6   0.000000 9.322816e-01     27

eval2@results[which(eval2@results$delta.AICc==0),]
#>   settings features rm full.AUC  Mean.AUC    Var.AUC Mean.AUC.DIFF
#> 6    LQP_2      LQP  2   0.8976 0.8252418 0.01532929    0.07752914
#>   Var.AUC.DIFF Mean.OR10   Var.OR10 Mean.ORmin    Var.ORmin     AICc
#> 6   0.01974627 0.2538006 0.03572751 0.01302083 0.0006781684 3792.472
#>   delta.AICc     w.AIC nparam
#> 6          0 0.9322816     27

Now let’s access the RasterStack of the model predictions. Note that these predictions are in the ‘raw’ output format.

eval2@predictions
#> class       : RasterStack 
#> dimensions  : 192, 186, 35712, 6  (nrow, ncol, ncell, nlayers)
#> resolution  : 0.5, 0.5  (x, y)
#> extent      : -125, -32, -56, 40  (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
#> names       :          L_1,         LQ_1,        LQP_1,          L_2,         LQ_2,        LQP_2 
#> min values  : 4.416935e-08, 2.014694e-18, 2.082106e-19, 4.478873e-08, 2.552360e-18, 3.260426e-18 
#> max values  :   0.01224537,   0.01745164,   0.01414898,   0.01223516,   0.01840422,   0.01585029

Now plot the model with the lowest AICc:

plot(eval2@predictions[[which(eval2@results$delta.AICc==0)]], main="Relative occurrence rate")

If we used the ‘maxent.jar’ algorithm, we can also access a list of Maxent model objects, which (as all lists) can be subset with double brackets (e.g. results@eval2[[1]]). The Maxent model objects provide access to various elements of the model (including the lambda file). The model objects can also be used for predicting models into other time periods or geographic areas. Note that the html file that is created when Maxent is run is not kept.

(Stay tuned for an update on the vignette focusing on the output from the ‘maxnet’ algorithm)

Let’s look at the model object for our “AICc optimal” model:

aic.opt <- eval2@models[[which(eval2@results$delta.AICc==0)]]

aic.opt
#> class    : MaxEnt 
#> variables: bio1 bio12 bio16 bio17 bio5 bio6 bio7 bio8 biome 
#> output html file no longer exists

The “results” slot shows the Maxent model statistics:

aic.opt@results
#>                                                                                         [,1]
#> X.Training.samples                                                                  261.0000
#> Regularized.training.gain                                                             1.3466
#> Unregularized.training.gain                                                           1.5109
#> Iterations                                                                          500.0000
#> Training.AUC                                                                          0.8976
#> X.Background.points                                                                5600.0000
#> bio1.contribution                                                                     5.8445
#> bio12.contribution                                                                    8.7935
#> bio16.contribution                                                                   13.6053
#> bio17.contribution                                                                    9.3897
#> bio5.contribution                                                                     8.3884
#> bio6.contribution                                                                    10.9994
#> bio7.contribution                                                                    31.4864
#> bio8.contribution                                                                     1.2791
#> biome.contribution                                                                   10.2137
#> bio1.permutation.importance                                                          13.1881
#> bio12.permutation.importance                                                         35.7216
#> bio16.permutation.importance                                                          6.0448
#> bio17.permutation.importance                                                          6.2319
#> bio5.permutation.importance                                                           1.3011
#> bio6.permutation.importance                                                           7.6112
#> bio7.permutation.importance                                                          20.7751
#> bio8.permutation.importance                                                           2.3983
#> biome.permutation.importance                                                          6.7278
#> Entropy                                                                               7.3034
#> Prevalence..average.of.logistic.output.over.background.sites.                         0.1331
#> Fixed.cumulative.value.1.cumulative.threshold                                         1.0000
#> Fixed.cumulative.value.1.logistic.threshold                                           0.0295
#> Fixed.cumulative.value.1.area                                                         0.5893
#> Fixed.cumulative.value.1.training.omission                                            0.0307
#> Fixed.cumulative.value.5.cumulative.threshold                                         5.0000
#> Fixed.cumulative.value.5.logistic.threshold                                           0.0994
#> Fixed.cumulative.value.5.area                                                         0.4187
#> Fixed.cumulative.value.5.training.omission                                            0.0651
#> Fixed.cumulative.value.10.cumulative.threshold                                       10.0000
#> Fixed.cumulative.value.10.logistic.threshold                                          0.1530
#> Fixed.cumulative.value.10.area                                                        0.3289
#> Fixed.cumulative.value.10.training.omission                                           0.0766
#> Minimum.training.presence.cumulative.threshold                                        0.0570
#> Minimum.training.presence.logistic.threshold                                          0.0036
#> Minimum.training.presence.area                                                        0.7680
#> Minimum.training.presence.training.omission                                           0.0000
#> X10.percentile.training.presence.cumulative.threshold                                18.9935
#> X10.percentile.training.presence.logistic.threshold                                   0.2227
#> X10.percentile.training.presence.area                                                 0.2241
#> X10.percentile.training.presence.training.omission                                    0.0996
#> Equal.training.sensitivity.and.specificity.cumulative.threshold                      23.3502
#> Equal.training.sensitivity.and.specificity.logistic.threshold                         0.2615
#> Equal.training.sensitivity.and.specificity.area                                       0.1877
#> Equal.training.sensitivity.and.specificity.training.omission                          0.1877
#> Maximum.training.sensitivity.plus.specificity.cumulative.threshold                   19.5007
#> Maximum.training.sensitivity.plus.specificity.logistic.threshold                      0.2264
#> Maximum.training.sensitivity.plus.specificity.area                                    0.2195
#> Maximum.training.sensitivity.plus.specificity.training.omission                       0.1034
#> Balance.training.omission..predicted.area.and.threshold.value.cumulative.threshold    3.7561
#> Balance.training.omission..predicted.area.and.threshold.value.logistic.threshold      0.0782
#> Balance.training.omission..predicted.area.and.threshold.value.area                    0.4527
#> Balance.training.omission..predicted.area.and.threshold.value.training.omission       0.0383
#> Equate.entropy.of.thresholded.and.original.distributions.cumulative.threshold        14.9767
#> Equate.entropy.of.thresholded.and.original.distributions.logistic.threshold           0.1905
#> Equate.entropy.of.thresholded.and.original.distributions.area                         0.2652
#> Equate.entropy.of.thresholded.and.original.distributions.training.omission            0.0958

You can use the var.importance function to get a data.frame of two variable importance metrics: percent contribution and permutation importance. See the function help file for more information on these metrics.

var.importance(aic.opt)
#>   variable percent.contribution permutation.importance
#> 1     bio1               5.8445                13.1881
#> 2    bio12               8.7935                35.7216
#> 3    bio16              13.6053                 6.0448
#> 4    bio17               9.3897                 6.2319
#> 5     bio5               8.3884                 1.3011
#> 6     bio6              10.9994                 7.6112
#> 7     bio7              31.4864                20.7751
#> 8     bio8               1.2791                 2.3983
#> 9    biome              10.2137                 6.7278

The “lambdas” slot shows which variables were included in the model. After the variable name, the next number is the variable coefficient, then the min and max of that variable for the inut data. If the coefficient is 0, that variable was not included in the model. You will likely find the syntax to be cryptic and the information is not stored in a very user-friendly way. Fortunately, John Baumgartner has developed a useful function to parse this file into a more user-friendly data.frame. See the parse_lambdas.R function in his rmaxent package for more details.

aic.opt@lambdas
#>  [1] "bio1, 15.66661717129279, -1.0, 289.0"             
#>  [2] "bio12, 0.0, 0.0, 7682.0"                          
#>  [3] "bio16, -6.411128107833744, 0.0, 2458.0"           
#>  [4] "bio17, 0.0, 0.0, 1496.0"                          
#>  [5] "bio5, 0.0, 97.0, 364.0"                           
#>  [6] "bio6, 0.4315893299759933, -128.0, 240.0"          
#>  [7] "bio7, -24.46825761752212, 62.0, 330.0"            
#>  [8] "bio8, 0.0, -25.0, 290.0"                          
#>  [9] "biome, -3.6830923841996883, 1.0, 14.0"            
#> [10] "bio16^2, 1.4299111936955575, 0.0, 6041764.0"      
#> [11] "bio17^2, 0.6126479011546977, 0.0, 2238016.0"      
#> [12] "bio5^2, -2.5057534913913284, 9409.0, 132496.0"    
#> [13] "bio6^2, -7.623902054808719, 0.0, 57600.0"         
#> [14] "bio7^2, -0.7467190393583492, 3844.0, 108900.0"    
#> [15] "bio8^2, -1.9341955556154555, 16.0, 84100.0"       
#> [16] "biome^2, 2.2735844264675706, 1.0, 196.0"          
#> [17] "bio1*bio12, 7.368911770796855, -117.0, 2028048.0" 
#> [18] "bio1*bio16, -1.1888262769365994, -51.0, 641538.0" 
#> [19] "bio1*bio7, -1.3886543019097317, -198.0, 65205.0"  
#> [20] "bio1*biome, 6.2325764893648525, -10.0, 3794.0"    
#> [21] "bio12*bio5, 1.6056636848833303, 0.0, 2396784.0"   
#> [22] "bio12*bio7, 18.669329322871263, 0.0, 737464.0"    
#> [23] "bio16*bio17, -8.760783487306433, 0.0, 3629296.0"  
#> [24] "bio16*bio8, -4.778708055853241, -1150.0, 638038.0"
#> [25] "bio16*biome, -6.0443865992213635, 0.0, 25900.0"   
#> [26] "bio17*bio7, -7.320134172296795, 0.0, 145705.0"    
#> [27] "bio17*biome, -22.075731094885455, 0.0, 16436.0"   
#> [28] "bio5*bio7, -1.0131674372072306, 14580.0, 113505.0"
#> [29] "bio5*bio8, -0.8296516600822442, -2675.0, 104110.0"
#> [30] "bio6*biome, 0.926757275688733, -1280.0, 3234.0"   
#> [31] "bio7*biome, -0.9895691775865285, 62.0, 3250.0"    
#> [32] "linearPredictorNormalizer, 6.493603525339264"     
#> [33] "densityNormalizer, 63.09031541146724"             
#> [34] "numBackgroundPoints, 5600"                        
#> [35] "entropy, 7.303428234120329"

Finally, the ENMevaluate object also remembers which occurrence partitioning method you used:

eval2@partition.method
#>              method aggregation.factor1 aggregation.factor2 
#>     "checkerboard2"                 "2"                 "2"

Plotting results

Plotting options in R are extremely flexible and here we demonstrate some key tools to explore the results of an ENMevaluate object graphically.

Plotting model predictions
Plotting response curves

ENMeval has a built-in plotting function (eval.plot) to visualize the results of different models. It requires the results table of the ENMevaluation object. By default, it plots delta.AICc values.

eval.plot(eval2@results)

You can choose which evaluation metric to plot, and you can also include error bars, if relevant.

eval.plot(eval2@results, 'Mean.AUC', var='Var.AUC')

You can also plot the permutation importance or percent contribution.

df <- var.importance(aic.opt)
barplot(df$permutation.importance, names.arg=df$variable, las=2, ylab="Permutation Importance")

Plotting model predictions

If you generated raster predictions of the models (i.e., rasterpreds=T), you can easily plot them. For example, let’s look at the first two models included in our analysis - remember that the output values are in Maxent’s ‘raw’ units.

plot(eval2@predictions[[1]], legend=F)

# Now add the occurrence and background points, colored by evaluation bins:
points(eval2@bg.pts, pch=3, col=eval2@bg.grp, cex=0.5)
points(eval2@occ.pts, pch=21, bg=eval2@occ.grp)

Let’s see how model complexity changes the predictions in our example. We’ll compare the model predictions of the model with only linear feature classes and with the highest regularization multiplier value we used (i.e., fc=‘L’, RM=2) versus the model with all feature class combination and the lowest regularization multiplier value we used (i.e., fc=‘LQP’, RM=1).

# bisect the plotting area to make two columns
par(mfrow=c(1,2), mar=c(2,2,1,0))

plot(eval2@predictions[['L_2']], ylim=c(-30,20), xlim=c(-90,-40), legend=F, main='L_2 prediction')

plot(eval2@predictions[['LQP_1']], ylim=c(-30,20), xlim=c(-90,-40), legend=F, main='LQP_1 prediction')

Plotting response curves

We can also plot the response curves of our model to see how different input variables influence our model predictions. (Note that, as with the dismo::maxent function, using this function requires that the maxent.jar file be installed in the dismo package java folder).

  response(eval2@models[[1]])

ENMeval Vignette

Robert Muscarella, Jamie M. Kass, and Peter Galante

2018-08-14

Introduction

Data Acquisition & Pre-processing

Partitioning Occurrences for Evaluation

1. Block

2. Checkerboard1

3. Checkerboard2

4. k-1 Jackknife

5. Random k-fold

6. User-defined

Running ENMeval

Initial considerations

Exploring the results

Plotting results

Plotting model predictions

Plotting response curves

Downstream Analyses (under construction)

Resources (under construction)

Web Resources

General Guides

Model Evaluation

Some Empirical Examples