Reads the provenance collected by rdtLite or rdt from execution of a script or commands in the console. Creates a human-readable summary of the provenance, including details on the computing environment, loaded libraries, scripts used (if any), input and output files, console output, and error and warning messages. The summary is optionally saved to a text file and all related provenance files are optionally packaged in a zip file.
Install from GitHub:
Once installed, load the package:
library("provSummarizeR")
The summarize functions can be used in one of three ways.
prov.summarize ()
prov.summarize.run ("script.R")
prov.summarize.file ("prov_script/prov.json")
All three functions have two optional parameters, save and create.zip.
If save is true, the summary is saved to a text file, in addition to being displayed in the console. The file is named prov-summary.txt and is stored in the current provenance directory. The default value of save is false.
If create.zip is true, the provenance directory is packaged into a timestamped zip file and placed in the current working directory. This file will contain a copy of all input and output files and scripts used, as well as the prov-summary.txt if save is true. It also includes the prov.json file containing the detailed execution trace. The default value of create.zip is false.
Creating the zip file depends on use of an external zip program. It has been tested with zip for Unix/Mac OS and with 7z on Windows. It may or may not work with other zip programs. To use a program other than zip, set the R_ZIPCMD environment variable.
Here is an example of what the summary looks like. The first line contains the name of the main R script. The ENVIRONMENT section includes details describing how and when the script was executed and how the provenance was collected. The LIBRARIES section lists all libraries that were loaded along with their version numbers. The SOURCED SCRIPTS section lists any scripts that were sourced. The PRE-EXISTING section lists any variables in the global environment that were used but not set by the script or console session. The INPUTS section lists any input files or URLs. The OUTPUTS section lists any output files. The CONSOLE section lists any output to the screen. Finally the ERRORS section lists any error or warning messages that were generated when the script was executed.
PROVENANCE SUMMARY for basicTest.R
ENVIRONMENT:
Executed at 2019-08-26T10.08.19EDT
Total execution time is 6.3 seconds
Script last modified at 2019-01-02T12.43.34EST
Executed with R version 3.6.1 (2019-07-05)
Executed on x86_64 running mingw32
Provenance was collected with rdtLite 1.1.1
Provenance is stored in C:/Prov/prov_basicTest
Hash algorithm is md5
LIBRARIES:
base 3.6.1
datasets 3.6.1
ggplot2 3.2.1
graphics 3.6.1
grDevices 3.6.1
methods 3.6.1
rdtLite 1.1.1
stats 3.6.1
utils 3.6.1
SOURCED SCRIPTS:
None
PRE-EXISTING:
None
INPUTS:
URL : http://harvardforest.fas.harvard.edu/data/p00/hf000/hf000-01-daily-m.csv
2019-08-26 10:08:26
76551e9b09d96eb70bba9ae7a16aab9a
OUTPUTS:
File : shortdata.csv
2019-08-26 10:08:27
58725476ca78c8feb08ad15602d8a006
File : airt-vs-prec.pdf
2019-08-26 10:08:27
ea5167eff2c4e26d0525a8cb50ad8bb9
CONSOLE:
None
ERRORS:
In basicTest.R on line 66 :
Error in file(file, "rt"): cannot open the connection