geex
A user had a case of estimating parameters based on a dataset that contained only categorical predictors. The data can be represented either as one row per individual or one row per group defined by unique combinations of categories. In this example, I show how computations in geex
can be massively sped up using the latter data representation and the weights
option in estimate_equation
.
The following code generates two datasets: data1
has one row per unit and data2
has one row per unique combination of the categorical varibles.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
This is the estimating equation that the user provided as an example. I have no idea what the target parameters represent, but it nicely illustrates the point.
The timing to find point and variance estimates is compared:
system.time({
results1 <- m_estimate(
estFUN = example,
data = data1,
root_control = setup_root_control(start = c(.5, .5, .5))
)})
## user system elapsed
## 0.812 0.009 0.835
system.time({
results2 <- m_estimate(
estFUN = example,
data = data2,
weights = data2$n,
root_control = setup_root_control(start = c(.5, .5, .5))
)})
## user system elapsed
## 0.021 0.000 0.021
The latter option is clearly preferred.
And the results are basically identical:
## [1] 0.4123711 0.4014423 0.1655432
## [1] 0.4123711 0.4014423 0.1655432
## [,1] [,2] [,3]
## [1,] 0.0006245391 0.0000000000 0.0002507164
## [2,] 0.0000000000 0.0005776115 0.0002381903
## [3,] 0.0002507164 0.0002381903 0.0001988710
## [,1] [,2] [,3]
## [1,] 6.245391e-04 6.873914e-47 0.0002507164
## [2,] 6.873914e-47 5.776115e-04 0.0002381903
## [3,] 2.507164e-04 2.381903e-04 0.0001988710