New R Markdown validation feature allows for validation testing within specialized validation code chunks where the validate = TRUE
option is set. Using pointblank validation functions on data in these marked code chunks will flag overall failure if the stop threshold is exceeded anywhere. All errors are reported in the validation code chunk after rendering the document to HTML, where green or red status buttons indicate whether all validations succeeded or failures occurred. Clicking any such button reveals the otherwise hidden validation statements and their error messages (if any). Using pointblank in an R Markdown workflow is enabled by default once the pointblank library is loaded. While the framework for such testing is set up by default, the new validate_rmd()
function offers an opportunity to set UI and logging options.
Added an R Markdown template for the new R Markdown validation feature (Pointblank Validation
).
The new stop_if_not()
function works well as a standalone, replacement for stopifnot()
but is also customized for use in validation checks in R Markdown documents where pointblank is loaded. Using stop_if_not()
in a code chunk where the validate = TRUE
option is set will yield the correct reporting of successes and failures whereas stopifnot()
does not.
A knit.print()
method was added to facilitate the printing of the agent report table within an R Markdown code chunk.
col_vals_lt()
) directly on data tables has been changed. Before, a single test unit failure would trigger a warning. Now, a single test unit failing results in an error. Going back to the earlier behavior now requires the use of actions = warn_on_fail()
(a new helper function, which has a default warn_at
threshold value of 1
) with each invocation of a validation step function. The stop_on_fail()
helper function is also new in this release, and has a stop_at
threshold parameter, also with a default of 1
.Added 24 expectation functions (e.g., expect_col_exists()
, expect_rows_distinct()
, expect_col_schema_match()
, etc.) as complements of the 24 validation functions. All of these can be used for testthat tests of tabular data with a simplified interface that exposes an easy-to-use failure threshold
(defaulting to 1
).
Added 24 test functions (e.g., test_col_exists()
, test_rows_distinct()
, test_col_schema_match()
, etc.) to further complement the 24 validation functions. These functions return a logical value: TRUE
if the threshold (having a default of 1
) is exceeded, FALSE
otherwise. These test_*()
functions use the same simplified interface of the expect_*()
functions.
Added the col_vals_expr()
, expect_col_vals_expr()
, and test_col_vals_expr()
validation, expectation, and test functions, making it easier for DIY validations. The dplyr expr()
, case_when()
, and between()
functions were re-exported for easier accessibility here since they work exceedingly well with the new functions.
col_schema_match()
(and its expect and test analogues) gained new arguments: complete
and in_order
. These allow for some relaxation of constraints related to the completeness and ordering of columns defined in a col_schema
object (created by col_schema()
).
The preconditions
argument available in all validation, expectation, and test functions now accepts both formula and function values (previously, only formula values were accepted).
The get_agent_report()
function now has a size
argument as an option to get the agent report table in the "standard"
(width: 875px) size or the "small"
size (width: 575px); previously this option was only accessible through ...
.
The appearance of the agent report has improved and it’s gained some new features: (1) data extracts for failing rows (on row-based validation steps) can be downloaded as CSVs via the new buttons that appear in the EXT
column, (2) there are useful has tooltips on most fields of the table (e.g., hovering over items in STEP
will show the brief, TBL
icons will describe whether any preconditions were applied to the table prior to interrogation, etc.), and (3) there are printing improvements in the COLUMNS
and VALUES
columns (e.g., table columns are distinguished from literal values).
Improved the appearance of the email message generated by email_blast()
and email_preview()
. This email message, when using the stock_msg_body()
and stock_msg_footer()
as defaults for msg_body
and msg_footer
, embeds a "small"
version of the agent report and provides some introductory text with nicer formatting than before.
All functions now have revised documentation that is more complete, has more examples, and consistent across the many validation, expectation, and test functions.
The package README
now contains better graphics, some reworked examples, and a new section on the package’s design goals (with a listing of other R packages that also focus on table validation).
Rewrote the internal stock_stoppage()
and stock_warning()
functions so that the generated error and warning messages match whether validation functions are used directly on data or expectation functions are being used.
Console status messages when performing an interrogation now only appear in an interactive session. They will no longer appear during R Markdown rendering nor during execution of unattended scripts.
The col_vals_regex()
validation function (plus the associated expectation and test functions) can now be used with database tables (on some of the DB types that support regular expressions). This has been tested on MySQL and PostgreSQL, which have differing underlying SQL implementations.
The col_schema()
function now allows for either uppercase or lowercase SQL column types (using .db_col_types = "sql"
). Previously, supplying SQL columns types as uppercase (e.g., “INT”, “TINYINT”, etc.) would always fail validation because the SQL column types of the target table are captured as lowercase values during the create_agent()
call.
Many new tests were added to cover both the new functions and the existing functions. It’s important for a validation package that testing be comprehensive and rigorous, so, this will continue to be a focus in forthcoming releases.
Fixed a duration label bug in the console status messages that appear during interrogation (now consistently has values reported in seconds)
Added column validity checks inside of internal interrogate_*()
functions
Fixed implementation of the col_vals_between()
and col_vals_not_between()
step functions to work with tbl_dbi
objects.
Added the scan_data()
function, which thoroughly scans any table data so you can understand it better (giving you an HTML report).
Added the get_agent_x_list()
function to provide easy access to agent intel
Added the get_agent_report()
function to give fine control over the agent’s gt-based reportage; also, the agent’s default print method is now that report (with default appearance options)
Added the get_sundered_data()
function to split the table data into ‘pass’ and ‘fail’ pieces after interrogation
Added the col_schema_match()
validation step function; it works in conjunction with a col_schema
object (generated through the col_schema()
function) to help determine whether the expected schema matches the target table.
Added multilingual support to reports generated by agent validations and by those produced through the new scan_data()
function
More fully integrates the gt (for tables in reports) and blastula (for email production and delivery) packages
Numerous fixes to ensure compatibility with tibble 3.0.0
The pointblank package has been changed significantly from the previous version in favor of consistency and simplicity, better reporting, and increased power. The internals have been extensively refactored and the API has accordingly gone through revisions.
The focus_on()
function has been removed in favor of directly using a data object. This means that a single use of create_agent()
can now only work on a single table at a time (create_agent()
now has a tbl
argument). Also, the input tbl
can be a data.frame
, a tbl_df
, or a tbl_dbi
object.
The preconditions
argument has changed and it can now be used to temporarily transform the table (i.e., transforming for a particular validation step). Previously, this option could only filter the input table but now it’s possible to do useful things like joining in a table, adding columns, filtering rows, etc. The preconditions
args now accepts a list of expressions that manipulate the table data.
The action_levels()
helper function is introduced to work with the actions
argument (in every validation step function). This replaces the warn_count
, stop_count
, notify_count
, warn_fraction
, stop_fraction
, and notify_fraction
arguments. The function allows for evaluation of functions (given in the fns
argument) as a reaction to exceeding thresholds specified in warn_at
, stop_at
, and notify_at
.
When using validation step functions directly on data (i.e., no use of create_agent()
), data is now passed straight through after that validation step. The purpose now in that mode is to create warnings or throw errors if the warn
or stop
thresholds are exceeded.
Across all pointblank validation step functions, the argument that stands for table columns has been normalized to columns
.
The incl_na
argument, which was implemented in a few validation step functions, has been renamed to na_pass
to better indicate its purpose (to consider any encountered NA
values as passing test units), and, its use has been expanded to other relevant functions.
It’s now possible to use vars()
and certain tidyselect select helpers (e.g., starts_with()
) when defining columns
in the pointblank validation step functions.
The conjointly()
function is a new validation step function that allows for multiple rowwise validation steps to be performed for joint validity testing.
1.0.0
0.2.0