Most R object comparison functions are good at telling you that objects are different, but less so at conveying how they are different. I wrote diffobj
to provide an “aha, that’s how they are different” comparison. In this vignette I will compare diffPrint
to all.equal
and to testthat::compare
.
Disclaimer: I picked the examples here to showcase diffobj
capabilities, not to carry out a fair and balanced comparison of these comparison functions. Nonetheless, I hope you will find the examples representative of common situations where comparison of R objects is useful.
I defined four pairs of numeric vectors for us to compare. I purposefully hid the variable definitions to simulate a comparison of unknown objects.
## [1] "Mean relative difference: 0.1"
The objects are different… At this point I would normally print both A1
and B1
to try to figure out how that difference came about since the “mean relative difference” is unhelpful.
## 1/10 mismatches
## [10] 10 - 11 == -1
testthat::compare
does a better job, but I still feel the need to look at A1
and B1
.
@@ 1 @@@@ 1 @@<[1] 1 2 3 4 5 6 7 8 9 10>[1] 1 2 3 4 5 6 7 8 9 11
Aha, that’s how they are different!
Let’s up the difficulty a little bit:
## 20/20 mismatches (average diff: 1.9)
## [1] 1 - 20 == -19
## [2] 2 - 1 == 1
## [3] 3 - 2 == 1
## [4] 4 - 3 == 1
## [5] 5 - 4 == 1
## [6] 6 - 5 == 1
## [7] 7 - 6 == 1
## [8] 8 - 7 == 1
## [9] 9 - 8 == 1
## ...
If you look closely you will see that despite a reported 20/20 differences, the two vectors are actually similar, at least in the part visible part of the output. With diffPrint
it is obvious that B2
and is the same as A2
, except that the last value has been moved to the first position:
@@ 1,2 @@@@ 1,2 @@<[1] 1 2 3 4 5 6 7 8 9 10 11>[1] 20 1 2 3 4 5 6 7 8 9 10<[12] 12 13 14 15 16 17 18 19 20>[12] 11 12 13 14 15 16 17 18 19
testthat::compare
throws in the towel as soon as lengths are unequal:
## Lengths differ: 20 is not 21
all.equal
does the same. diffPrint
is unfazed:
@@ 1,2 @@@@ 1,2 @@<[1] 1 2 3 4 5 6 7 8 9 10 11>[1] 20 21 1 2 3 4 5 6 7 8 9<[12] 12 13 14 15 16 17 18 19 20>[12] 10 11 12 13 14 15 16 17 18 19
diffPrint
also produces useful output for largish vectors:
@@ 1,4 @@@@ 1,4 @@<[1] 1 2 3 4 5>[1] 10001 1 2 3 8<[6] 6 7 8 9 10>[6] 9 10 11 12 13[11] 11 12 13 14 15[11] 14 15 16 17 18[16] 16 17 18 19 20[16] 19 20 21 22 23@@ 1798,5 @@@@ 1798,5 @@[8986] 8986 8987 8988 8989 8990[8986] 8989 8990 8991 8992 8993[8991] 8991 8992 8993 8994 8995[8991] 8994 8995 8996 8997 8998<[8996] 8996 8997 8998 8999 9000>[8996] 8999 9001 9002 9003 9004[9001] 9001 9002 9003 9004 9005[9001] 9005 9006 9007 9008 9009[9006] 9006 9007 9008 9009 9010[9006] 9010 9011 9012 9013 9014
Do note that the comparison algorithm scales with the square of the number of differences, so very large and different vectors will be slow to process.
R Core and package authors put substantial effort into print
and show
methods. diffPrint
takes advantage of this. Compare:
## [1] "Attributes: < Component \"row.names\": Numeric: lengths (150, 149) differ >"
## [2] "Component \"Sepal.Length\": Numeric: lengths (150, 149) differ"
## [3] "Component \"Sepal.Width\": Numeric: lengths (150, 149) differ"
## [4] "Component \"Petal.Length\": Numeric: lengths (150, 149) differ"
## [5] "Component \"Petal.Width\": Numeric: lengths (150, 149) differ"
## [ reached getOption("max.print") -- omitted 3 entries ]
to:
@@ 59,5 / 59,4 @@~Sepal.Length Sepal.Width Petal.Length Petal.Width Species58 4.9 2.4 3.3 1.0 versicolor59 6.6 2.9 4.6 1.3 versicolor<60 5.2 2.7 3.9 1.4 versicolor61 5.0 2.0 3.5 1.0 versicolor62 5.9 3.0 4.2 1.5 versicolor
And:
## [1] "Component \"coefficients\": Names: 1 string mismatch"
## [2] "Component \"coefficients\": Mean relative difference: 2.778944"
## [3] "Component \"residuals\": Mean relative difference: 0.7074011"
## [4] "Component \"effects\": Names: 1 string mismatch"
## [5] "Component \"effects\": Mean relative difference: 0.5907086"
## [ reached getOption("max.print") -- omitted 9 entries ]
to:
@@ 1,8 @@@@ 1,8 @@Call:Call:<lm(formula = hp ~ disp, data = mtcars)>lm(formula = hp ~ cyl, data = mtcars)Coefficients:Coefficients:<(Intercept) disp>(Intercept) cyl<45.7345 0.4376>-51.05 31.96
In these examples I limited all.equal
output to five lines for the sake of brevity. Also, since testthat::compare
reverts to all.equal
output with more complex objects I omit it from this comparison.
Another candidate comparison function is compare::compare
. I omitted it from this vignette because it focuses more on similarities than on differences. Additionally, testthat::compare
and compare::compare
print
methods conflict so they cannot be used together.
For a more thorough exploration of diffobj
methods and their features please see the primary diffobj
vignette.