We assume that the reader went through the first couple of sections of the introductory vignette.
In the validate package, an ‘indicator’ is a rule or function that takes as input a data set and outputs a number. Indicators are usually designed to be easily interpretable by domain experts and therefore depend strongly on the application. In ‘validate’ users are free to specify indicator. By specifing them separate from the programming workflow, they can be treated as first-class objects: indicator specs can be maintained, version-controlled, and documented in separate files (just like validation rules.)
Here is a simple example of the workflow.
i <- indicator(
mh = mean(height)
, mw = mean(weight)
, BMI = (weight/2.2046)/(height*0.0254)^2 )
ind <- confront(women, i)
In the first statement we define an indicator
object storing indicator expressions. Next, we confront a dataset with these indicators. The result is an object of class indication
. It prints as follows.
ind
## Object of class 'indication'
## Call:
## confront(dat = women, x = i)
##
## Confrontations: 3
## Warnings : 0
## Errors : 0
To study the results, the object can be summarized.
summary(ind)
## name items min mean max nNA error warning
## 1 mh 1 65.0000 65.00000 65.00000 0 FALSE FALSE
## 2 mw 1 136.7333 136.73333 136.73333 0 FALSE FALSE
## 3 BMI 15 22.0967 22.72691 24.03503 0 FALSE FALSE
## expression
## 1 mean(height)
## 2 mean(weight)
## 3 (weight/2.2046)/(height * 0.0254)^2
Observe that the first two indicators result in a single value (mh
, mw
) and the third one results in 15 values (BMI
). The columns error
and warning
indicate wether calculation of the indicators was problematic.
A specific problem that may occur is when the result of an indicator is non-numeric.
jj <- indicator(mh = mean(height), a = {"A"})
here, the second ‘indicator’ is an expression that always yields a constant (the character string "A"
).
cf <- confront(women, jj)
cf
## Object of class 'indication'
## Call:
## confront(dat = women, x = jj)
##
## Confrontations: 2
## Warnings : 1
## Errors : 0
warnings(cf)
## $a
## [1] "Expression did not evaluate to numeric or logical, returning NULL"
Values can be obtained with the values
function, or by converting to a data.frame
.
We add a unique identifier (this is optional) to make it easier to connect results with the data.
women$id <- letters[1:15]
Compute indicators and convert to data.frame
.
ind <- confront(women, i,key="id")
(out <- as.data.frame(ind))
## id name value expression
## 1 <NA> mh 65.00000 mean(height)
## 2 <NA> mw 136.73333 mean(weight)
## 3 a BMI 24.03503 (weight/2.2046)/(height * 0.0254)^2
## 4 b BMI 23.63114 (weight/2.2046)/(height * 0.0254)^2
## 5 c BMI 23.43589 (weight/2.2046)/(height * 0.0254)^2
## 6 d BMI 23.24065 (weight/2.2046)/(height * 0.0254)^2
## 7 e BMI 23.04570 (weight/2.2046)/(height * 0.0254)^2
## 8 f BMI 22.85132 (weight/2.2046)/(height * 0.0254)^2
## 9 g BMI 22.65775 (weight/2.2046)/(height * 0.0254)^2
## 10 h BMI 22.46518 (weight/2.2046)/(height * 0.0254)^2
## 11 i BMI 22.43519 (weight/2.2046)/(height * 0.0254)^2
## 12 j BMI 22.24034 (weight/2.2046)/(height * 0.0254)^2
## 13 k BMI 22.19922 (weight/2.2046)/(height * 0.0254)^2
## 14 l BMI 22.15113 (weight/2.2046)/(height * 0.0254)^2
## 15 m BMI 22.09670 (weight/2.2046)/(height * 0.0254)^2
## 16 n BMI 22.17600 (weight/2.2046)/(height * 0.0254)^2
## 17 o BMI 22.24240 (weight/2.2046)/(height * 0.0254)^2
Observe that there is no key for indicators mh
and mw
since these are constructed from multiple records.
Indicators can be constructed from and coerced to data.frames. To define an indicator you need to create a data.frame that at least has a character column called rule
. All other columns are optional.
idf <- data.frame(
rule = c("mean(height)","sd(height)")
, label = c("average height", "std.dev height")
, description = c("basic statistic","fancy statistic")
)
i <- indicator(.data=idf)
i
## Object of class 'indicator' with 2 elements:
## I1 [average height]: mean(height)
## I2 [std.dev height]: sd(height)
Now, confront with data and merge the results back with rule metadata.
quality <- as.data.frame(confront(women, i))
measures <- as.data.frame(i)
merge(quality, measures)
## name value expression label description origin
## 1 I1 65.000000 mean(height) average height basic statistic
## 2 I2 4.472136 sd(height) std.dev height fancy statistic
## created rule
## 1 2019-12-16 15:51:22 mean(height)
## 2 2019-12-16 15:51:22 sd(height)