sensobol: Computation of High-Order Sobol’ Sensitivity Indices

The sensobol package relies on data.table and parallel functions in the boot and parallel package to rapidly compute and bootstrap first, total, and if desired, second and third-order sensitivity indices. It currently uses two estimators: the Jansen (1999) (default) and the Saltelli et al. (2010) estimator.

Creation of the sample matrix

After loading the package, the first step is to define the settings of the uncertainty and sensitivity analysis, including the sample size of the sample matrix, the number of model inputs and the sensitivity indices we are interested in. As regards to the latter, sensobol offers the possibility to calculate up to third-order effects. Once these features are defined, we can build the sample matrix, which relies on Sobol’ quasi-random number sequences, and run the desired model. In this vignette we will use the Sobol’ G function as a test function.

library(sensobol)
library(ggplot2) # To plot the Sobol' bootstrap replicas

# Define settings -----------------------------------------

N <- 5000 # Sample size
k <- 8 # Number of parameters
params <- paste("X", 1:8, sep = "") # Vector with the name of the model inputs
R <- 100 # Number of bootstrap replicas

# Create the Sobol' matrices
A <- sobol_matrices(n = N, 
                    k = k,
                    second = TRUE, # We set a matrix for second-order effects
                    third = TRUE) # We set a matrix for third-order effects
#> Warning in sobol_matrices(n = N, k = k, second = TRUE, third = TRUE): The
#> arguments n and k will turn into N and params in the next release of the package
#> Warning in sobol_matrices(n = N, k = k, second = TRUE, third = TRUE): second
#> and third will be substituted by the argument order in the next release of the
#> package

# Compute model output ------------------------------------
Y <- sobol_Fun(A)

Uncertainty plot

Once the model output is computed, it is informative to visualize its uncertainty. sensobol provides a function to assess the model output distribution:


# Plot distribution of model output -----------------------

plot_uncertainty(Y, n = N)
#> Warning in plot_uncertainty(Y, n = N): The argument n will be substituted by N
#> in the next release of the package
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Sensitivity analysis

We can also plot scatterplots of the model output against the model inputs. This is a simple sensitivity analysis that allows to initially observe which model inputs might have an effect on the model output Y, as well as establish its relative importance. In general, scatterplots with “shape” indicate an influential model input.


# Scatterplots of model inputs vs. model output -----------

plot_scatter(x = A, 
             Y = Y, 
             n = N, 
             params = params)
#> Warning in plot_scatter(x = A, Y = Y, n = N, params = params): The argument n
#> will be substituted by N in the next release of the package

We then compute and bootstrap Sobol’ indices. Since we have designed a sample matrix that allows to compute up to third-order effects, we will go for first, second, third and total sensitivity indices here. As stated in the introduction of the vignette, the package currently offers two estimators: the one by Jansen (1999) (default) and the one by Saltelli et al. (2010). The sobol_indices function also allows for parallel computing to speed up the bootstrapping. For more information on how to use the parallel and ncpus parameters, check the boot function of the boot package.


# Compute Sobol' indices ----------------------------------

dt <- sobol_indices(Y = Y, 
                    params = params, 
                    type = "saltelli", 
                    R = R,
                    n = N, 
                    second = TRUE, 
                    third = TRUE)
#> Warning in sobol_indices(Y = Y, params = params, type = "saltelli", R = R, : The
#> argument n will turn into N in the next release of the package
#> Warning in sobol_indices(Y = Y, params = params, type = "saltelli", R = R, :
#> type is deprecated and will be removed in the next version
#> Warning in sobol_indices(Y = Y, params = params, type = "saltelli", R = R, :
#> second is deprecated and will be removed in the next version. The computation of
#> second/third order effects will be done with the argument order
#> Warning in sobol_indices(Y = Y, params = params, type = "saltelli", R = R, :
#> third is deprecated and will be removed in the next version

The output of sobol_indices is a data.table with two columns: firstly, the column parameters, which indicates whether the Sobol’ indices have been calculated for single model inputs, for pairs of model inputs (second-order) or for three model inputs (third-order). Secondly, the column V1, which stores an object of class boot with all the results of the bootstrap. Column V1, for instance, stores the bootstrap replicas. This is useful if we want to assess their distribution in order to select the most appropriate method for the computation of confidence intervals:


# Show output of the sobol_indices function ---------------

print(dt)
#>     parameters     V1
#>  1:         X1 <boot>
#>  2:         X2 <boot>
#>  3:         X3 <boot>
#>  4:         X4 <boot>
#>  5:         X5 <boot>
#>  6:         X6 <boot>
#>  7:         X7 <boot>
#>  8:         X8 <boot>
#>  9:      X1.X2 <boot>
#> 10:      X1.X3 <boot>
#> 11:      X1.X4 <boot>
#> 12:      X1.X5 <boot>
#> 13:      X1.X6 <boot>
#> 14:      X1.X7 <boot>
#> 15:      X1.X8 <boot>
#> 16:      X2.X3 <boot>
#> 17:      X2.X4 <boot>
#> 18:      X2.X5 <boot>
#> 19:      X2.X6 <boot>
#> 20:      X2.X7 <boot>
#> 21:      X2.X8 <boot>
#> 22:      X3.X4 <boot>
#> 23:      X3.X5 <boot>
#> 24:      X3.X6 <boot>
#> 25:      X3.X7 <boot>
#> 26:      X3.X8 <boot>
#> 27:      X4.X5 <boot>
#> 28:      X4.X6 <boot>
#> 29:      X4.X7 <boot>
#> 30:      X4.X8 <boot>
#> 31:      X5.X6 <boot>
#> 32:      X5.X7 <boot>
#> 33:      X5.X8 <boot>
#> 34:      X6.X7 <boot>
#> 35:      X6.X8 <boot>
#> 36:      X7.X8 <boot>
#> 37:   X1.X2.X3 <boot>
#> 38:   X1.X2.X4 <boot>
#> 39:   X1.X2.X5 <boot>
#> 40:   X1.X2.X6 <boot>
#> 41:   X1.X2.X7 <boot>
#> 42:   X1.X2.X8 <boot>
#> 43:   X1.X3.X4 <boot>
#> 44:   X1.X3.X5 <boot>
#> 45:   X1.X3.X6 <boot>
#> 46:   X1.X3.X7 <boot>
#> 47:   X1.X3.X8 <boot>
#> 48:   X1.X4.X5 <boot>
#> 49:   X1.X4.X6 <boot>
#> 50:   X1.X4.X7 <boot>
#> 51:   X1.X4.X8 <boot>
#> 52:   X1.X5.X6 <boot>
#> 53:   X1.X5.X7 <boot>
#> 54:   X1.X5.X8 <boot>
#> 55:   X1.X6.X7 <boot>
#> 56:   X1.X6.X8 <boot>
#> 57:   X1.X7.X8 <boot>
#> 58:   X2.X3.X4 <boot>
#> 59:   X2.X3.X5 <boot>
#> 60:   X2.X3.X6 <boot>
#> 61:   X2.X3.X7 <boot>
#> 62:   X2.X3.X8 <boot>
#> 63:   X2.X4.X5 <boot>
#> 64:   X2.X4.X6 <boot>
#> 65:   X2.X4.X7 <boot>
#> 66:   X2.X4.X8 <boot>
#> 67:   X2.X5.X6 <boot>
#> 68:   X2.X5.X7 <boot>
#> 69:   X2.X5.X8 <boot>
#> 70:   X2.X6.X7 <boot>
#> 71:   X2.X6.X8 <boot>
#> 72:   X2.X7.X8 <boot>
#> 73:   X3.X4.X5 <boot>
#> 74:   X3.X4.X6 <boot>
#> 75:   X3.X4.X7 <boot>
#> 76:   X3.X4.X8 <boot>
#> 77:   X3.X5.X6 <boot>
#> 78:   X3.X5.X7 <boot>
#> 79:   X3.X5.X8 <boot>
#> 80:   X3.X6.X7 <boot>
#> 81:   X3.X6.X8 <boot>
#> 82:   X3.X7.X8 <boot>
#> 83:   X4.X5.X6 <boot>
#> 84:   X4.X5.X7 <boot>
#> 85:   X4.X5.X8 <boot>
#> 86:   X4.X6.X7 <boot>
#> 87:   X4.X6.X8 <boot>
#> 88:   X4.X7.X8 <boot>
#> 89:   X5.X6.X7 <boot>
#> 90:   X5.X6.X8 <boot>
#> 91:   X5.X7.X8 <boot>
#> 92:   X6.X7.X8 <boot>
#>     parameters     V1

We can access the bootstrap replicas with the sobol_replicas function. Here we will only access the replicas of the first and total-order indices:


# Access boot replicas ------------------------------------

# Extract boot samples
b.rep <- sobol_replicas(dt = dt, k = k)
#> Warning: sobol_replicas will be removed in the next version

# Plot
ggplot2::ggplot(b.rep, aes(value)) +
  geom_histogram() +
  labs(x = "Y",
       y = "Count") +
  facet_wrap(parameters~variable, 
             scales = "free") +
  labs(x = "Variance", 
       y = "Count") +
  theme_bw() +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        legend.background = element_rect(fill = "transparent",
                                         color = NA),
        legend.key = element_rect(fill = "transparent",
                                  color = NA)) 
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Distribution of the bootstrap replicas. For better-shaped histograms, increase the sample size R.

Once we have assessed the distribution of the replicas, we can decide which confidence interval suits our data best. sensobol currently offers suport for normal, percentile, bca and basic confidence intervals, all retrieved internally from the boot::boot.ci function. In this case, we will compute normal confidence intervals:


# Compute confidence intervals ----------------------------

dt.ci <- sobol_ci(dt, 
                  params = params, 
                  type = "norm", 
                  conf = 0.95, 
                  second = TRUE, 
                  third = TRUE) 
#> Warning: sobol_ci will be removed in the next version as the computation of
#> confidence intervals will be done directly by sobol_indices

Calculation of the approximation error

Due to measurement error, it is likely that model inputs that are not influential display sensitivity indices that are not zero. In order to assess the extent of this measurement error, we can calculate the Sobol’ indices of a dummy parameter. This allows to differentiate truly influential model inputs from those whose non-zero indices might simply result from the noise of the calculation. The sensobol package implements the approach by Khorashadi Zadeh et al. (2017) to calculate and bootstrap the Sobol’ indices of a dummy parameter:


# Compute Sobol' indices of a dummy parameter -------------

dt.dummy <- sobol_dummy(Y = Y, 
                        params = params, 
                        R = R, 
                        n = N)
#> Warning in sobol_dummy(Y = Y, params = params, R = R, n = N): The argument n
#> will turn into N in the next release of the package
#> Warning in sobol_dummy(Y = Y, params = params, R = R, n = N): The computation
#> of confidence intervals for the dummy parameter will be directly done by
#> sobol_dummy

The function sobol_dummy_ci computes the confidence intervals for the Sobol’ indices of the dummy parameter:


# Compute confidence intervals for the dummy parameter ----

dt.dummy.ci <- sobol_ci_dummy(dt.dummy, 
                              type = "norm", 
                              conf = 0.95)
#> Warning: sobol_ci_dummy will be removed in the next version as the computation
#> of confidence intervals will be done directly by the function sobol_dummy

To visualize Sobol’ first ( $S_i$ ) and total ( $S_{Ti}$ ) -order indices, use the plot_sobol function with type = 1. This setting also allows to plot the confidence intervals of the dummy parameter, if needed. For second ( $S_{ij}$ ) and third ( $S_{ijk}$ ) - order indices, use type = 2 and type = 3 respectively. Narrower confidence intervals will be obtained by designing a sample matrix with a larger sample size N.


# Plot Sobol' first and total-order indices -------------------------

plot_sobol(dt.ci, dummy = dt.dummy.ci, type = 1)

$First and total-order Sobol’ indices. The transparent, blue horizontal frame shows the approximation error for S_{Ti}, computed via a dummy parameter. Only the model inputs whose lower confidence interval for S_{Ti} do not overlap the frame can be considered to truly have an interaction effect. The approximation error for S_i is very small and can not be seen in the plot.$

First and total-order Sobol’ indices. The transparent, blue horizontal frame shows the approximation error for $S_{Ti}$ , computed via a dummy parameter. Only the model inputs whose lower confidence interval for $S_{Ti}$ do not overlap the frame can be considered to truly have an interaction effect. The approximation error for $S_i$ is very small and can not be seen in the plot.


# Plot Sobol' second and third-order indices ------------------------

lapply(2:3, function(x) plot_sobol(dt.ci, type = x))
#> [[1]]

#> 
#> [[2]]

Have a nice uncertainty / sensitivity analysis!

References

Jansen, M. 1999. “Analysis of variance designs for model output.” Computer Physics Communications 117 (1): 35–43. https://doi.org/10.1016/S0010-4655(98)00154-4.

Khorashadi Zadeh, F., J. Nossent, F. Sarrazin, F. Pianosi, A. van Griensven, T. Wagener, and W. Bauwens. 2017. “Comparison of variance-based and moment-independent global sensitivity analysis approaches by application to the SWAT model.” Environmental Modelling and Software 91. Elsevier Ltd: 210–22. https://doi.org/10.1016/j.envsoft.2017.02.001.

Saltelli, A., P. Annoni, I. Azzini, F. Campolongo, M. Ratto, and S. Tarantola. 2010. “Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index.” Computer Physics Communications 181 (2): 259–70. https://doi.org/10.1016/j.cpc.2009.09.018.