plm tries to follow as close as possible the way models are fitted using lm
. This relies on the following steps, using the formula
-data
:
model.frame
by getting the relevant arguments (formula
, data
, subset
, weights
, na.action
and offset
) and the supplementary argument,model.frame
the response (with model.response
), the model matrix (with model.matrix
),plm.fit
with X
and y
as arguments.with some modifications.
Firstly, panel data has a special structure which is describe by an index
argument. This can be used in the pdata.frame
function which returns a pdata.frame
object which can be used as the data
argument of plm
. If the data
argument of plm
is an ordinary data.frame
, the index
argument can also be used as an argument of plm
. In this case, the pdata.frame
function is used internally to transform the data.
Next, the formula
, which is the first and mandatory argument of plm
is coerced to a Formula
object.
model.frame
is then called, but with the data
argument in the first position (a pdata.frame
object) and the formula
in the second position. This unusual order of the arguments enables to use a specific model.frame.pdata.frame
method defined in plm
.
As for the model.frame.formula
method, a data.frame
is returned, with a terms
attribute.
Next, the X
matrix is extracted using model.matrix
. The usual way to do so is to feed the function with to arguments, a formula
or a terms
object and a data.frame
created with model.frame
. lm
use something like model.matrix(terms(mf), mf)
where mf
is a data.frame
created with a model.frame
. Therefore, model.matrix
needs actually one argument and not two and we therefore wrote a model.matrix.pdata.frame
which does the job ; the method first check that the argument has a term
attribute, extract the terms
(actually the formula
) and then compute the model.matrix
.
The response y
is usually extracted using model.response
, with a data.frame
created with model.frame
as first argument, but it is not generic. We therefore create a generic called pmodel.response
and provide a pmodel.response.pdata.frame
method. We illustrate these features using a simplified (in terms of covariates) example with the SeatBelt
data set :
library("plm")
data("SeatBelt", package = "pder")
SeatBelt$occfat <- with(SeatBelt, log(farsocc / (vmtrural + vmturban)))
pSB <- pdata.frame(SeatBelt)
We start with an OLS (pooling
) specification.
formols <- occfat ~ log(usage) + log(percapin)
mfols <- model.frame(pSB, formols)
Xols <- model.matrix(mfols)
y <- pmodel.response(mfols)
coef(lm.fit(Xols, y))
## (Intercept) log(usage) log(percapin)
## 7.4193570 0.1657293 -1.1583712
which is equivalent with :
coef(plm(formols, SeatBelt, model = "pooling"))
## (Intercept) log(usage) log(percapin)
## 7.4193570 0.1657293 -1.1583712
Next we use an instrumental variables specification. usage
is endogenous and instrumented by three variables indicating the law context : ds
, dp
and dsp
.
The model is described using a two-parts formula, the first part of the RHS describing the covariates and the second part the instruments. The following two formulations can be used :
formiv1 <- occfat ~ log(usage) + log(percapin) | log(percapin) + ds + dp + dsp
formiv2 <- occfat ~ log(usage) + log(percapin) | . - log(usage) + ds + dp + dsp
The second formulation has two advantages :
-
sign in the second RHS part of the formula.The formula is coerced to a Formula
, using the Formula
package. model.matrix.pdata.frame
then internally calls the model.matrix.Formula
in order to extract the covariates and instruments model matrices :
mfSB1 <- model.frame(pSB, formiv1)
X1 <- model.matrix(mfSB1, rhs = 1)
W1 <- model.matrix(mfSB1, rhs = 2)
head(X1, 3) ; head(W1, 3)
## (Intercept) log(usage) log(percapin)
## 8 1 -0.7985077 9.955748
## 9 1 -0.4155154 9.975622
## 10 1 -0.4155154 10.002110
## (Intercept) log(percapin) ds dp dsp
## 8 1 9.955748 0 0 0
## 9 1 9.975622 1 0 0
## 10 1 10.002110 1 0 0
For the second (and preferred formulation), the dot
argument should be set and is passed to the Formula
methods. .
has actually two meanings :
which correspond respectively to dot = "seperate"
(the default) and dot = "previous"
. See the difference between :
library("Formula")
head(model.frame(Formula(formiv2), SeatBelt), 3)
## occfat log(usage) log(percapin) state year farsocc farsnocc usage
## 8 -3.788976 -0.7985077 9.955748 AK 1990 90 8 0.45
## 9 -3.904837 -0.4155154 9.975622 AK 1991 81 20 0.66
## 10 -3.699611 -0.4155154 10.002110 AK 1992 95 13 0.66
## percapin unemp meanage precentb precenth densurb densrur
## 8 21073 7.05 29.58628 0.04157167 0.03252657 1.099419 0.1906836
## 9 21496 8.75 29.82771 0.04077293 0.03280357 1.114670 0.1906712
## 10 22073 9.24 30.21070 0.04192957 0.03331731 1.114078 0.1672785
## viopcap proppcap vmtrural vmturban fueltax lim65 lim70p mlda21 bac08
## 8 0.0009482704 0.008367458 2276 1703 8 0 0 1 0
## 9 0.0010787370 0.008940661 2281 1740 8 0 0 1 0
## 10 0.0011257068 0.008366873 2005 1836 8 1 0 1 0
## ds dp dsp
## 8 0 0 0
## 9 1 0 0
## 10 1 0 0
head(model.frame(Formula(formiv2), SeatBelt, dot = "previous"), 3)
## occfat log(usage) log(percapin) ds dp dsp
## 8 -3.788976 -0.7985077 9.955748 0 0 0
## 9 -3.904837 -0.4155154 9.975622 1 0 0
## 10 -3.699611 -0.4155154 10.002110 1 0 0
In the first case, all the covariates are returned by model.frame
as the .
is understood by default as “everything”.
In plm
, the dot
argument is internally set to previous
so that the end-user doesn’t have to worry about these subtleties.
mfSB2 <- model.frame(pSB, formiv2)
X2 <- model.matrix(mfSB2, rhs = 1)
W2 <- model.matrix(mfSB2, rhs = 2)
head(X2, 3) ; head(W2, 3)
## (Intercept) log(usage) log(percapin)
## 8 1 -0.7985077 9.955748
## 9 1 -0.4155154 9.975622
## 10 1 -0.4155154 10.002110
## (Intercept) log(percapin) ds dp dsp
## 8 1 9.955748 0 0 0
## 9 1 9.975622 1 0 0
## 10 1 10.002110 1 0 0
The iv estimator can then be obtained as a 2SLS estimator : first regress the covariates on the instruments and get the fitted values :
HX1 <- lm.fit(W1, X1)$fitted.values
head(HX1, 3)
## (Intercept) log(usage) log(percapin)
## 8 1 -1.0224257 9.955748
## 9 1 -0.5435055 9.975622
## 10 1 -0.5213364 10.002110
Next regress the response on these fitted values :
coef(lm.fit(HX1, y))
## (Intercept) log(usage) log(percapin)
## 7.5641209 0.1768576 -1.1722590
Or using the formula
-data
interface with the ivreg
function :
coef(AER::ivreg(formiv1, data = SeatBelt))
## (Intercept) log(usage) log(percapin)
## 7.5641209 0.1768576 -1.1722590
or plm
:
coef(plm(formiv1, SeatBelt, model = "pooling"))
## (Intercept) log(usage) log(percapin)
## 7.5641209 0.1768576 -1.1722590