CausalQueries
is a package that lets you declare binary causal models, update beliefs about causal types given data and calculate arbitrary estimands. Model definition makes use of dagitty
functionality. Updating is implemented in stan
.
See here for a guide to using CausalQueries
along with many examples of causal models
To install CausalQueries
:
install.packages("remotes")
remotes::install_github("macartan/CausalQueries")
Causal models are defined by:
X=1
and for which Y=1
if and only if X=1
. The set of causal types grows rapidly with the number of nodes and the number of nodes pointing into any given node. In this setting imposing functional forms is the same as placing restrictions on causal types: such restrictions reduce complexity but require substantive assumptions. An example of a restriction might be “Y
is monotonic in X
.”A wrinkle:
Our goal is to form beliefs over parameters but also over more substantive estimands:
With a causal model in hand and data available about some or all of the nodes, it is possible to make use of a generic stan
model that generates posteriors over the parameter vector.
Given updated (or prior) beliefs about parameters it is possible to calculate causal estimands of inference from a causal model. For example “What is the probably that X
was the cause of Y
given X=1
, Y=1
and Z=1
.”
Here is an example of a model in which X
causes M
and M
causes Y
. There is, in addition, unobservable confounding between X
and Y
. This is an example of a model in which you might use information on M
to figure out whether X
caused Y
.
The DAG is defined using dagitty
syntax like this:
model <- make_model("X -> M -> Y")
To add the confounding we have to allow an additional parameter that allows a possibly different assignment probability for X
given a causal type for Y
.
model <- set_confound(model, list(X = "Y[X=1] == 1"))
We then set priors thus:
model <- set_priors(model, distribution = "jeffreys")
You can plot the dag, making use of functions in the dagitty
package.
plot(model)
You can draw data from the model, like this:
data <- make_data(model, n = 10)
Updating is done like this:
updated_model <- update_model(model, data)
Finally you can calculate an estimand of interest like this:
CoE <- query_distribution(
model = updated_model,
using = "posteriors",
query = "Y[X=0] == 0",
subset = "X==1 & Y==1"
)
This uses the posterior distribution and the model to assess the “causes of effects” estimand: the probability that X=1
was the cause of Y=1
in those cases in which X=1
and Y=1
. The approach is to imagine a set of “do” operations on the model, that control the level of X
and to inquire about the level of Y
given these operations, and then to assess how likely is is that Y
would be 0 if X
were fixed at 0 within a set that naturally take on particular values of X
and Y
. By the same token this posterior can be calculated conditional on observations of M
, allowing an assessment of how data on mediators alters inference about the causes of effects.
The approach used in CausalQueries
is a generalization of the biqq
models described in “Mixing Methods: A Bayesian Approach” (Humphreys and Jacobs, 2015, https://doi.org/10.1017/S0003055415000453). The conceptual extension makes use of work on probabilistic causal models described in Pearl’s Causality (Pearl, 2009, https://doi.org/10.1017/CBO9780511803161). The approach to generating a generic stan
function that can take data from arbitrary models was developed in key contributions by Jasper Cooper (http://jasper-cooper.com/) and Georgiy Syunyaev (http://gsyunyaev.com/). Lily Medina (https://lilymedina.github.io/) did the magical work of pulling it all together and developing approaches to characterizing confounding and defining estimands. Julio Solis has done wonders to simplify the specification of priors.