For editing, reuse, maintaining and sharing of validation rules it is convenient to define them in text files. In this vignette we demonstrate how rules can be imported from or exported to text files.
The easiest way to define rules is by storing them in a free-form text file. For example, create a file called ex-1.txt
# content of ex-1.txt
# Some checks on the 'women' dataset.
height > 0
weight > 0
mean(height/weight) < 0.5
We can read the rules using
v <- validator(.file="ex-1.txt")
Note that the #
is used to start comment-lines, just like in regular R syntax. Validation rules may span several lines. The only restriction is that rules are stated in valid syntax and can be recognized by validate as validating (basically, it should result in a logical
).
To set options and metadata for the rules, the well-known yaml (yaml ainโt markup language) format is used. Yaml is a human-readable way to define (nested) structures. Here is an example of a yaml-based rule definition file.
# content of ex-2.yaml
rules:
-
expr: height > 0
name: height
label: height positivity
description: |
According to the latest research, the average height of American women
must be positive.
-
expr: weight > 0
name: weight
label: weight positivity
description: |
By definition, weight must be positive.
This file can be read in the same way.
v <- validator(.file='rules.yaml')
There are a few things to note here:
rules:
directive.yaml
indentation must be spaces and not tabs. In the example, there are two spaces of indentation before a key name (expr
, name
, and so on). For long rules or pieces of text, you can add a |
or >
after the colon and add a multiline string. You need to start on a new line and add an extra indentation.!
) must be enquoted since the exclamation mark has special meaning in YAML.The latter remark means that this is wrong:
# READING THIS FILE FAILS
rules:
-
expr: !is.na(height)
and this is ok:
# READING THIS FILE SUCCEEDS
rules:
-
expr: '!is.na(height)'
Rules exported with export_yaml
are enquoted by default.
This is possible. Just separate free-form sections from structured sections with three dashes on a single line.
# content of ex-3.yaml
rules:
-
expr: height > 0
name: height
label: height positivity
description: |
According to the latest research, the average height of American women
must be positive.
-
expr: weight > 0
name: weight
label: weight positivity
description: |
By definition, weight must be positive.
---
# free form starts here
# we expect the following mean ratio
mean(height/weight) < 0.5
# we expect a high correlation
cor(height,weight) > 0.99
This can be done at the beginning of your file. Start and end the options section with three dashes on a line to start the (free form or structured) rule section.
---
options:
raise: errors
---
height > 0
The options you set here will be part of the validator
object, that is created once you read in the file. The options are valid for every confrontation you use this validator for, unless they are overwritten during the call to confront
.
This is useful, for example when you have a general rule set, that applies to all your data files, and some rules that apply only in specific cases. The file with specific cases can include one or more rule files in that case. Files should be included in the same section as the options.
---
include:
- petes_rules.yaml
- nancys_rules.yaml
options:
raise: errors
---
# start rule definitions here
There are two ways to do that. You can either write to a yaml
file immediately as follows
v <- validator(height>0, weight> 0)
export_yaml(v,file="my_rules.yaml")
or you can get the yaml
text string using as_yaml
cat(as_yaml(v))