The goal of this vignette is to provide a quick overview of using the PanelMatch
package, highlight important features and functions, and help users get the most out of their experience with the package. It assumes that you have been able to install the package successfully and are already familiar with the basics of R.
We will be working with the dem
data set, which comes included with the package. Before doing any matching, estimation or further analysis, it is helpful to understand the distribution of the treatment variable within your data set. The package hopes to facilitate this with the `DisplayTreatment function.
library(PanelMatch)
DisplayTreatment(unit.id = "wbcode2",
time.id = "year", legend.position = "none",
xlab = "year", ylab = "Country Code",
treatment = "dem", data = dem)
In the plot, the x axis represents time, and the y axis displays the different units in your data set. Red tiles indicate periods where “treatment” is applied to a given unit and blue tiles indicate “control” periods. White spaces indicate missing data. In the above plot, we have a large number of unique units and time periods, so the axes become hard to read. The
DisplayTreatment
function uses ggplot2
to create this plot, so any custom styling can be applied easily using ggplot2
conventions. However, the function has some built in options to make cleaning up the plot a little easier. When the data set is particularly large, it can also help to use the dense.plot
option. There are many ways to customize these plots. We will return to some of these later, but consult the documentation for the full list and descriptions of the arguments.
DisplayTreatment(unit.id = "wbcode2",
time.id = "year", legend.position = "none",
xlab = "year", ylab = "Country Code",
treatment = "dem", data = dem,
hide.x.axis.label = TRUE, hide.y.axis.label = TRUE) # axis label options
DisplayTreatment(unit.id = "wbcode2",
time.id = "year", legend.position = "none",
xlab = "year", ylab = "Country Code",
treatment = "dem", data = dem,
hide.x.axis.label = TRUE, hide.y.axis.label = TRUE,
dense.plot = TRUE) # setting dense.plot to TRUE
Next, we will move to the PanelMatch
function. The primary purposes of this function are to 1) create sets matching treated units to control units and 2) determine weights for each control unit in a given matched set. These weights are then used in the estimation stage in an intuitive way: units with higher weights factor more heavily into the estimations.
is achieved by matching units that receive treatment after previously being untreated (ie. units that move from control
to treatment
at a certain time) to control units that have matching treatment histories in a specified lag window, while also remaining untreated during the same period that the treated unit receives treatment. For example, if unit 4 is a control unit until 1992, when it receives treatment, then, for a specified lag = 4, it will be matched with control units that share an identical treatment history with unit 4 from 1988-1991, while also remaining control units in 1992.
is achieved by defining which variables should be used for measuring similarity/distance between units, determining the most comparable control units to be included in the matched set, and assigning weights to control units as appropriate. There are many parameters that can be tuned for this step. Users must choose a refinement method (“mahalanobis”, “ps.match”, “CBPS.match”, “ps.weight”, “CBPS.weight”, “ps.msm.weight”, “CBPS.msm.weight”, or “none”). The “matching” or mahalanobis refinement methods will assign equal weights to the size.match
most similar control units in a matched set. The “weighting” methods will generate weights in such a way that control units more similar to treated units will be assigned higher weights.
Users must also define which covariates should be used in this process for defining similarity between units. This is set using the covs.formula
argument, which takes the form of a one side formula object. The variables defined on the right hand side of the formula are the variables used in these calculations. Users can included “lagged” versions of variables using I(lag(name.of.var, 0:n))
.
The first example sets refinement.method
to none
, meaning all control units will receive equal weights and no refinement is performed. This will also be helpful to refer back to when we are evaluating the impact of refinement.
The PanelMatch
function returns a PanelMatch
object, which contains a matched.set
object. Please consult the wiki page about matched set objects for more about them.
PM.results.none <- PanelMatch(lag = 4, time.id = "year", unit.id = "wbcode2",
treatment = "dem", refinement.method = "none",
data = dem, match.missing = TRUE,
size.match = 5, qoi = "att", outcome.var = "y",
lead = 0:4, forbid.treatment.reversal = FALSE,
use.diagonal.variance.matrix = TRUE)
Below, we will use the mahalanobis
option for refinement.method
and will use only contemporaneous values of the tradewb
to define similarity.
PM.results <- PanelMatch(lag = 4, time.id = "year", unit.id = "wbcode2",
treatment = "dem", refinement.method = "mahalanobis", # use Mahalanobis distance
data = dem, match.missing = TRUE,
covs.formula = ~ tradewb,
size.match = 5, qoi = "att" , outcome.var = "y",
lead = 0:4, forbid.treatment.reversal = FALSE,
use.diagonal.variance.matrix = TRUE)
Next, we will include 4 lags of the tradewb
variable and the outcome variable, excluding any contemporaneous values.
While there are no hard rules for how to set the many different PanelMatch
parameters, in general, you want to find a balance between having a good number of matched sets and having matched sets that are large enough in size. Having many small matched sets will lead to larger standard errors. You also want to create sets with good covariate balance, which will be discussed later.
PM.results <- PanelMatch(lag = 4, time.id = "year", unit.id = "wbcode2",
treatment = "dem", refinement.method = "mahalanobis",
data = dem, match.missing = TRUE,
covs.formula = ~ I(lag(tradewb, 1:4)) + I(lag(y, 1:4)), # lags
size.match = 5, qoi = "att", outcome.var = "y",
lead = 0:4, forbid.treatment.reversal = FALSE,
use.diagonal.variance.matrix = TRUE)
We can also apply listwise deletion of units for missing data.
PM.results1 <- PanelMatch(lag = 4, time.id = "year", unit.id = "wbcode2",
treatment = "dem", refinement.method = "mahalanobis",
data = dem, match.missing = FALSE, listwise.delete = TRUE, # listwise deletion used
covs.formula = ~ I(lag(tradewb, 1:4)) + I(lag(y, 1:4)),
size.match = 5, qoi = "att", outcome.var = "y",
lead = 0:4, forbid.treatment.reversal = FALSE,
use.diagonal.variance.matrix = TRUE)
Let’s try out a weighting method using propensity scores and then compare performance.
PM.results2 <- PanelMatch(lag = 4, time.id = "year", unit.id = "wbcode2",
treatment = "dem", refinement.method = "ps.weight",
data = dem, match.missing = FALSE, listwise.delete = TRUE,
covs.formula = ~ I(lag(tradewb, 1:4)) + I(lag(y, 1:4)),
size.match = 5, qoi = "att", outcome.var = "y",
lead = 0:4, forbid.treatment.reversal = FALSE,
use.diagonal.variance.matrix = TRUE)
Now that we’ve created matched sets, you might be interested in visualizing and evaluating a few aspects of these results. First, let’s make sure the results are sensible, by using DisplayTreatment
to see if our treated and control units meet the parameters outlined previously for one of our matched sets.
# extract the first matched set
mset <- PM.results.none$att[1]
DisplayTreatment(unit.id = "wbcode2",
time.id = "year", legend.position = "none",
xlab = "year", ylab = "Country Code",
treatment = "dem", data = dem,
matched.set = mset, # this way we highlight the particular set
show.set.only = TRUE)
We can also get an idea of the number and size of our matched sets by using the summary
and plot
methods for matched.set
objects. By default, the plot
method excludes empty matched sets (treated units that could not be matched to any control units) by default, showing the number of empty sets in a separate vertical bar at x = 0. This can be turned off by setting include.empty.sets
to TRUE
.
summary(PM.results.none$att)
#> $overview
#> wbcode2 year matched.set.size
#> 1 4 1992 74
#> 2 4 1997 2
#> 3 6 1973 63
#> 4 6 1983 73
#> 5 7 1991 81
#> 6 7 1998 1
#> 7 12 1992 74
#> 8 13 2003 58
#> 9 15 1991 81
#> 10 16 1977 63
#> 11 17 1991 81
#> 12 18 1991 81
#> 13 22 1991 81
#> 14 25 1982 72
#> 15 26 1985 76
#> 16 31 1993 65
#> 17 34 1990 79
#> 18 36 2000 57
#> 19 38 1992 74
#> 20 40 1990 79
#> 21 40 1996 1
#> 22 40 2002 1
#> 23 41 1991 81
#> 24 45 1993 65
#> 25 47 1999 59
#> 26 50 1978 63
#> 27 52 1979 60
#> 28 55 1978 63
#> 29 56 1992 74
#> 30 57 1995 57
#> 31 59 1990 1
#> 32 64 1995 57
#> 33 65 1970 58
#> 34 65 1979 60
#> 35 65 1996 57
#> 36 70 1994 58
#> 37 70 2005 1
#> 38 72 1975 61
#> 39 73 1984 74
#> 40 75 1966 46
#> 41 75 1986 77
#> 42 78 1992 74
#> 43 80 1982 72
#> 44 81 2000 57
#> 45 82 2006 49
#> 46 83 1990 79
#> 47 84 1999 59
#> 48 96 2002 58
#> 49 97 2005 54
#> 50 101 1988 80
#> 51 104 2005 54
#> 52 105 2004 57
#> 53 109 1993 65
#> 54 110 1993 65
#> 55 112 1993 65
#> 56 115 1994 58
#> 57 116 1993 65
#> 58 118 1997 58
#> 59 119 1991 81
#> 60 120 1992 74
#> 61 123 1993 65
#> 62 124 1994 58
#> 63 128 1994 58
#> 64 133 1991 81
#> 65 134 1979 60
#> 66 134 1999 59
#> 67 135 1990 79
#> 68 138 1991 81
#> 69 138 2006 49
#> 70 141 1972 63
#> 71 141 1988 80
#> 72 142 1994 58
#> 73 143 1980 61
#> 74 144 1987 78
#> 75 149 1976 63
#> 76 150 1993 65
#> 77 154 1990 79
#> 78 155 1993 65
#> 79 158 1965 40
#> 80 158 1986 77
#> 81 159 2000 57
#> 82 162 2004 57
#> 83 163 1996 57
#> 84 163 2001 59
#> 85 164 1982 72
#> 86 167 1988 80
#> 87 168 1993 65
#> 88 169 1992 74
#> 89 177 1974 62
#> 90 177 1978 1
#> 91 183 1983 2
#> 92 187 1994 58
#> 93 188 1985 76
#> 94 199 1994 58
#> 95 201 1991 81
#> 96 202 1978 63
#> 97 17 2009 0
#> 98 70 1999 0
#> 99 82 1994 0
#> 100 109 1999 0
#> 101 133 1999 0
#> 102 143 1993 0
#> 103 167 1991 0
#> 104 177 1992 0
#> 105 183 1973 0
#>
#> $set.size.summary
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.00 57.00 62.00 55.75 74.00 81.00
#>
#> $number.of.treated.units
#> [1] 105
#>
#> $num.units.empty.set
#> [1] 9
#>
#> $lag
#> [1] 4
plot(PM.results.none$att)
plot(PM.results.none$att, include.empty.sets = TRUE) # The red tiny bar that would otherwise indicate empty sets is now part of the grey bar
Now, let’s see the impact of performing refinement by comparing the balance of covariates before and after. See the documentation for the get_covariate_balance
for more about how the package calculates covariate balance. We hope to see low values (.2 is a helpful threshold value as a rule of thumb) in the results of our covariate balance calculations. We also hope to see an improvement in covariate balance after applying some method of refinement to our results.
We can use the get_covariate_balance
function to assess our results. To view the raw results, set plot = FALSE
.
# get covariate balance for sets that are unrefined
get_covariate_balance(PM.results.none$att,
data = dem,
covariates = c("tradewb", "y"),
plot = FALSE)
#> tradewb y
#> t_4 -0.07245466 0.291871990
#> t_3 -0.20263216 0.211112946
#> t_2 -0.24038804 0.109744256
#> t_1 -0.12564222 -0.004944938
#> t_0 -0.11943058 -0.015136766
# compare with sets that have had various refinements applied
get_covariate_balance(PM.results$att,
data = dem,
covariates = c("tradewb", "y"),
plot = FALSE)
#> tradewb y
#> t_4 0.062473658 0.10714884
#> t_3 -0.009804017 0.09712049
#> t_2 0.027631301 0.07957258
#> t_1 0.166663572 0.05650336
#> t_0 0.220957018 0.04910509
get_covariate_balance(PM.results1$att,
data = dem,
covariates = c("tradewb", "y"),
plot = FALSE)
#> tradewb y
#> t_4 0.02181957 0.034828426
#> t_3 -0.03882313 0.034841740
#> t_2 -0.02740226 0.012962544
#> t_1 0.04176720 0.003361670
#> t_0 0.03928411 0.002391157
get_covariate_balance(PM.results2$att,
data = dem,
covariates = c("tradewb", "y"),
plot = FALSE)
#> tradewb y
#> t_4 0.014362590 0.04035905
#> t_3 0.005320866 0.04199342
#> t_2 0.009085826 0.04225722
#> t_1 0.030401128 0.03994840
#> t_0 0.049106248 0.04209055
You can also check the balance of unrefined matched sets by setting the use.equal.weights
argument to TRUE
. From these examples we can see that different refinement methods have various levels of success in improving covariate balance. Refinement using mahalanobis distance and the use.diagonal.variance.matrix
argument set to TRUE
performed well. Refinement using propensity score weighting also helped improve covariate balance, relative to that of the unrefined matched sets.
The parameters and refinement methods that work best will depend on your data and application/context.
We can also create plots showing covariate balance throughout the lag window period by setting the plot
argument to TRUE
, as shown below.
get_covariate_balance(PM.results1$att,
data = dem,
covariates = c("tradewb", "y"),
plot = TRUE, # visualize by setting plot to TRUE
ylim = c(-.2, .2))
We can also evaluate our results using the balance_scatter
function:
balance_scatter(non_refined_set = PM.results.none$att,
refined_list = list(PM.results$att, PM.results2$att),
data = dem,
covariates = c("y", "tradewb"))
We now move to the next major part of the package: obtaining point estimates and standard errors using PanelEstimate
. The package uses bootstrapping to obtain standard errors. By default, 1000 bootstrap iterations are used, and .95 is the default confidence level.
The function returns a PanelEstimate
object, which behaves like a list. As such, you can access the various elements just as you would in a list.
PE.results <- PanelEstimate(sets = PM.results1, data = dem)
names(PE.results)
#> [1] "estimates" "bootstrapped.estimates" "bootstrap.iterations"
#> [4] "standard.error" "method" "lag"
#> [7] "lead" "confidence.level" "qoi"
#> [10] "matched.sets"
# View the point estimates
PE.results[["estimates"]]
#> t+0 t+1 t+2 t+3 t+4
#> -0.1182277 0.7773662 1.2014708 2.1068868 1.6472643
PanelEsimate
objects have custom summary
and plot
methods defined. The plot
method can be customized with all of the same arguments/operations as the regular plot
function in base R. This method shows the point estimates for the specified periods, along with the standard errors.
summary(PE.results)
#> Weighted Difference-in-Differences with Mahalanobis Distance
#> Matches created with 4 lags
#>
#> Standard errors computed with 1000 Weighted bootstrap samples
#>
#> Estimate of Average Treatment Effect on the Treated (ATT) by Period:
#> $summary
#> estimate std.error 2.5% 97.5%
#> t+0 -0.1182277 0.5138888 -1.208424 0.7728617
#> t+1 0.7773662 0.8794504 -1.159555 2.3421549
#> t+2 1.2014708 1.2376579 -1.385415 3.4444499
#> t+3 2.1068868 1.5600946 -1.161002 4.9924774
#> t+4 1.6472643 1.8950433 -2.230917 5.1789592
#>
#> $lag
#> [1] 4
#>
#> $iterations
#> [1] 1000
#>
#> $qoi
#> [1] "att"
plot(PE.results)
This last plot makes it clear that the effect of treatment on treated units (att) in this configuration is statistically insignificant.