Unsurprisingly, you may want to save your results to your hard disk in case of power outages or random system crashes to allow restarting at the interrupted location, save more complete versions of the analysis results in case you want to inspect the complete simulation results at a later time, store/restore the R seeds for debugging and replication purposes, and so on. This document demonstrates various ways in which SimDesign saves output to hard disks.

As usual, define the functions of interest.

library(SimDesign)
# SimFunctions()

Design <- createDesign(N = c(10,20,30))
Generate <- function(condition, fixed_objects = NULL) {
    dat <- rnorm(condition$N)    
    dat
}

Analyse <- function(condition, dat, fixed_objects = NULL) {
    ret <- c(p = t.test(dat)$p.value)
    ret
}

Summarise <- function(condition, results, fixed_objects = NULL) {
    ret <- EDR(results, alpha = .05)
    ret
}

This is a very simple simulation that takes very little time to complete, however it will be used to show the basic saving concepts supported in SimDesign. Note that more detailed information is located in the runSimulation documentation.

1 Option: save = TRUE

The save flag triggers whether temporary results should be saved to the hard-disk in case of power outages and crashes. When this flag is used results can easily be restored automatically and the simulation can continue where it left off after the hardware problems have been dealt with. In fact, no modifications in the code required because runSimulation() will automatically detect temporary files to resume from (so long as they are resumed from the same computer node; otherwise, see the save_details list).

As a simple example, say that in the \(N=30\) condition something went terribly wrong and the simulation crashed. However, the first two design conditions are perfectly fine. The save flag is very helpful here because the state is not lost and the results are still useful. Finally, supplying a filename argument will safely save the aggregate simulation results to the hard-drive for future reference; however, this won’t be called until the simulation is complete.

Analyse <- function(condition, dat, fixed_objects = NULL) {
    if(condition$N == 30) stop('Danger Will Robinson!')
    ret <- c(p = t.test(dat)$p.value)
    ret
}

res <- runSimulation(Design, replications = 1000, save=TRUE, filename='my-simple-sim',
                     generate=Generate, analyse=Analyse, summarise=Summarise)
## 
## 
Design row: 1/3;   Started: Mon Jan 20 18:42:57 2020;   Total elapsed time: 0.00s 
## 
## 
Design row: 2/3;   Started: Mon Jan 20 18:42:57 2020;   Total elapsed time: 0.37s 
## 
## 
Design row: 3/3;   Started: Mon Jan 20 18:42:58 2020;   Total elapsed time: 0.73s

Check that temporary file exists.

files <- dir()
files[grepl('SIMDESIGN', files)]
## character(0)

Notice here that the simulation stopped at 67% because the third design condition threw too many consecutive errors (this is a built-in fail-safe in SimDesign). However, after we fix this portion of the code the simulation can be restarted at the previous state and continue on as normal. Therefore, no time is lost.

Analyse <- function(condition, dat, fixed_objects = NULL) {
    ret <- c(p = t.test(dat)$p.value)
    ret
}

res <- runSimulation(Design, replications = 1000, save=TRUE, filename='my-simple-sim',
                     generate=Generate, analyse=Analyse, summarise=Summarise)
## 
## 
Design row: 1/3;   Started: Mon Jan 20 18:42:58 2020;   Total elapsed time: 0.00s 
## 
## 
Design row: 2/3;   Started: Mon Jan 20 18:42:58 2020;   Total elapsed time: 0.35s 
## 
## 
Design row: 3/3;   Started: Mon Jan 20 18:42:58 2020;   Total elapsed time: 0.73s

Check which files exist.

files <- dir()
files[grepl('SIMDESIGN', files)]
## character(0)
files[grepl('my-simp', files)]
## [1] "my-simple-sim-1.rds" "my-simple-sim.rds"

Notice that when complete, the temporary file is removed from the hard-drive.

2 Option: save_results = TRUE

Continuing on, the save_results argument will output the elements which are passed to Summarise() to separate .rds files containing all the analysis results. Note that when using save_results the save flag is automatically set to TRUE to ensure that the simulation state is correctly tracked.

res <- runSimulation(Design, replications = 1000, save_results=TRUE,
                     generate=Generate, analyse=Analyse, summarise=Summarise)
## 
## 
Design row: 1/3;   Started: Mon Jan 20 18:42:59 2020;   Total elapsed time: 0.00s 
## 
## 
Design row: 2/3;   Started: Mon Jan 20 18:42:59 2020;   Total elapsed time: 0.35s 
## 
## 
Design row: 3/3;   Started: Mon Jan 20 18:42:59 2020;   Total elapsed time: 0.71s
dir <- dir()
directory <- dir[grepl('SimDesign-results', dir)]
dir(directory)
## [1] "results-row-1.rds" "results-row-2.rds" "results-row-3.rds"

Here we can see that three .rds files have been saved to the folder with the computer node name and a prefixed 'SimDesign-results' character string. Each .rds file contains the respective simulation results (including errors and warnings), which can be read in directly with readRDS()

row1 <- readRDS(paste0(directory, '/results-row-1.rds'))
str(row1)
## List of 5
##  $ condition  :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of  3 variables:
##   ..$ ID         : int 1
##   ..$ REPLICATION: int 0
##   ..$ N          : num 10
##  $ results    :'data.frame': 1000 obs. of  1 variable:
##   ..$ p: num [1:1000] 0.363 0.816 0.555 0.696 0.931 ...
##  $ errors     : 'table' int[0 (1d)] 
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ : NULL
##  $ error_seeds: int[0 , 1:626] 
##  $ warnings   : 'table' int[0 (1d)] 
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ warnings: NULL
row1$condition
## # A tibble: 1 x 3
##      ID REPLICATION     N
##   <int>       <int> <dbl>
## 1     1           0    10
head(row1$results)
##        p
## 1 0.3628
## 2 0.8162
## 3 0.5553
## 4 0.6956
## 5 0.9306
## 6 0.1617

or, equivalently, with the SimResults() function

row1 <- SimResults(res, 1)
str(row1)
## List of 5
##  $ condition  :Classes 'tbl_df', 'tbl' and 'data.frame': 1 obs. of  3 variables:
##   ..$ ID         : int 1
##   ..$ REPLICATION: int 0
##   ..$ N          : num 10
##  $ results    :'data.frame': 1000 obs. of  1 variable:
##   ..$ p: num [1:1000] 0.363 0.816 0.555 0.696 0.931 ...
##  $ errors     : 'table' int[0 (1d)] 
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ : NULL
##  $ error_seeds: int[0 , 1:626] 
##  $ warnings   : 'table' int[0 (1d)] 
##   ..- attr(*, "dimnames")=List of 1
##   .. ..$ warnings: NULL

The SimResults() function has the added benefit that it can read-in all simulation results at once, or simply hand pick which ones should be inspected. For example, here is how all the saved results can be inspected:

input <- SimResults(res)
str(input)
## List of 3
##  $ :List of 5
##   ..$ condition  :Classes 'tbl_df', 'tbl' and 'data.frame':  1 obs. of  3 variables:
##   .. ..$ ID         : int 1
##   .. ..$ REPLICATION: int 0
##   .. ..$ N          : num 10
##   ..$ results    :'data.frame':  1000 obs. of  1 variable:
##   .. ..$ p: num [1:1000] 0.363 0.816 0.555 0.696 0.931 ...
##   ..$ errors     : 'table' int[0 (1d)] 
##   .. ..- attr(*, "dimnames")=List of 1
##   .. .. ..$ : NULL
##   ..$ error_seeds: int[0 , 1:626] 
##   ..$ warnings   : 'table' int[0 (1d)] 
##   .. ..- attr(*, "dimnames")=List of 1
##   .. .. ..$ warnings: NULL
##  $ :List of 5
##   ..$ condition  :Classes 'tbl_df', 'tbl' and 'data.frame':  1 obs. of  3 variables:
##   .. ..$ ID         : int 2
##   .. ..$ REPLICATION: int 0
##   .. ..$ N          : num 20
##   ..$ results    :'data.frame':  1000 obs. of  1 variable:
##   .. ..$ p: num [1:1000] 0.674 0.481 0.12 0.27 0.325 ...
##   ..$ errors     : 'table' int[0 (1d)] 
##   .. ..- attr(*, "dimnames")=List of 1
##   .. .. ..$ : NULL
##   ..$ error_seeds: int[0 , 1:626] 
##   ..$ warnings   : 'table' int[0 (1d)] 
##   .. ..- attr(*, "dimnames")=List of 1
##   .. .. ..$ warnings: NULL
##  $ :List of 5
##   ..$ condition  :Classes 'tbl_df', 'tbl' and 'data.frame':  1 obs. of  3 variables:
##   .. ..$ ID         : int 3
##   .. ..$ REPLICATION: int 0
##   .. ..$ N          : num 30
##   ..$ results    :'data.frame':  1000 obs. of  1 variable:
##   .. ..$ p: num [1:1000] 0.664 0.527 0.848 0.528 0.956 ...
##   ..$ errors     : 'table' int[0 (1d)] 
##   .. ..- attr(*, "dimnames")=List of 1
##   .. .. ..$ : NULL
##   ..$ error_seeds: int[0 , 1:626] 
##   ..$ warnings   : 'table' int[0 (1d)] 
##   .. ..- attr(*, "dimnames")=List of 1
##   .. .. ..$ warnings: NULL

Should the need arise to remove the results directory then the SimClean() function is the easiest way to remove unwanted files and directories.

SimClean(results = TRUE)

3 Option: store_results = TRUE

Similar to save_results = TRUE, though less recommended for RAM related reasons (if the number of replications/design conditions is too large then the R session may run out of memory as the simulation progresses), users can pass store_results = TRUE to store the complete simulation results in the returned object.

After the simulation is complete, these results can be extracted using SimExtract(results, what = 'results'). For example,

res <- runSimulation(Design, replications = 3, store_results=TRUE,
              generate=Generate, analyse=Analyse, summarise=Summarise)
## 
## 
Design row: 1/3;   Started: Mon Jan 20 18:43:00 2020;   Total elapsed time: 0.00s 
## 
## 
Design row: 2/3;   Started: Mon Jan 20 18:43:00 2020;   Total elapsed time: 0.00s 
## 
## 
Design row: 3/3;   Started: Mon Jan 20 18:43:00 2020;   Total elapsed time: 0.00s
list_results <- SimExtract(res, what = 'results')
list_results
## $`N=10`
##             p
## [1,] 0.427849
## [2,] 0.794487
## [3,] 0.003854
## 
## $`N=20`
##            p
## [1,] 0.08479
## [2,] 0.14410
## [3,] 0.88922
## 
## $`N=30`
##            p
## [1,] 0.03818
## [2,] 0.94444
## [3,] 0.39311

4 My recommendations

My general recommendation when running simulations is to use the save = TRUE flag when your simulation is finally ready for run time (particularly for simulations which take a long time to finish), and to supply a filename = 'some_simulation_name'. As the aggregation of the simulation results is often what you are interested in then this approach will ensure that the results are stored in a succinct manner for later analyses.

As well, passing save_results = TRUE will save all the results from the input Analysis() function which was passed to Summarise(), as well as save a final file to your hard-drive (but with the built-in safety feature that it will never over-write previously saved files). Hence, you’ll be able to inspect all the elements manually if the need were to arise (e.g., to inspect ECR(res1, alpha = .01) instead of the ECR(results, alpha = .05) which may have only been used in the Summarise() function). However, do this only if your hard-drive can store all of the analysis results; if you are not careful, you could easily fill up your entire drive with the analysis results alone.

Finally, if you are worried about reproducibility, particularly during the debugging states, then the seed and save_seeds are the arguments you should utilize. seed sets the global seed for each design row, while save_seed writes the .Random.seed state to the hard-disk for complete reproducibility within each design condition (note that all seeds can be saved in parallel or when running simulations on single cores). If save_seed were used then the exact simulation state can be reloaded to the generated data by passing the specific saved seed file to load_seed. That said, problematic error states are automatically stored within the returned simulation object (which can be extracted via SimExtract()), and therefore if the purpose of saving the seeds is to help with debugging the package already automatically collects such information to replicate potentially hard-to-find errors/bugs within each design condition.