The random parameters (or mixed) logit model

Derivation of the model

A mixed logit model or random parameters logit model is a logit model for which the parameters are assumed to vary from one individual to another. It is therefore a model that takes the heterogeneity of the population into account.

The probabilities

For the standard logit model, the probability that individual \(i\) choose alternative \(j\) is: \[ P_{il}=\frac{e^{\beta'x_{il}}}{\sum_j e^{\beta'x_{ij}}}. \]

Suppose now that the coefficients are individual-specific. The probabilities are then:

\[ P_{il}=\frac{e^{\beta_i'x_{il}}}{\sum_j e^{\beta_i'x_{ij}}}. \]

A first approach consists on estimating the parameters for every individual. However, these parameters are identified and can be consistently estimated only if a large number of choice situations per individual is available, which is scarcely the case.

A more appealing approach consists on considering the \(\beta_i\)’s as random draws from a distribution whose parameters are estimated, which leads to the mixed logit model. The probability that individual \(i\) will choose alternative \(l\), for a given value of \(\beta_i\) is:

\[ P_{il} \mid \beta_i =\frac{e^{\beta_i'x_{il}}}{\sum_j e^{\beta_i'x_{ij}}}. \]

To get the unconditional probability, we have to integrate out this conditional probability, using the density function of \(\beta\). Suppose that \(V_{il}=\alpha+\beta_i x_{il}\), i.e., there is only one individual-specific coefficient and that the density of \(\beta_i\) is \(f(\beta,\theta)\), \(\theta\) being the vector of the parameters of the distribution of \(\beta\). The unconditional probability is then:

\[ P_{il}= \mbox{E}(P_{il} \mid \beta_i) = \int_{\beta}(P_{il} \mid \beta)f(\beta,\theta)d\beta = \int_{\beta}\frac{e^{\beta_i'x_{il}}}{\sum_j e^{\beta_i'x_{ij}}}f(\beta,\theta)d\beta, \]

which is a one-dimensional integral that can be efficiently estimated by quadrature methods. If \(V_{il}=\beta_i^{\top} x_{il}\) where \(\beta_i\) is a vector of length \(K\) and \(f(\beta,\theta)\) is the joint density of the \(K\) individual-specific coefficients, the unconditional probability is:

\[ P_{il}= \mbox{E}(P_{il} \mid \beta_i) = \int_{\beta_1}\int_{\beta_2}...\int_{\beta_K}(P_{il} \mid \beta)f(\beta,\theta)d\beta_1d\beta_2... d\beta_K. \]

This is a \(K\)-dimensional integral which cannot easily be estimated by quadrature methods. The only practical method is then to use simulations. More precisely, \(R\) draws of the parameters are taken from the distribution of \(\beta\), the probability is computed for every draw and the unconditional probability, which is the expected value of the conditional probabilities is estimated by the average of the \(R\) probabilities.

Individual parameters

The expected value of a random coefficient (\(\mbox{E}(\beta)\)) is simply estimated by the mean of the \(R\) draws on its distribution: \(\bar{\beta}=\sum_{r=1}^R \beta_r\). Individual parameters are obtained by first computing the probabilities of the observed choice of \(i\) for every value of \(\beta_r\):

\[ P_{ir}=\frac{\sum_j y_{ij} e^{\beta_r^{'}x_{ij}}}{\sum_j e^{\beta_r^{'}x_{ij}}}, \]

where \(y_{ij}\) is a dummy equal to one if \(i\) has chosen alternative \(j\). The expected value of the parameter for an individual is then estimated by using these probabilities to weight the \(R\) \(\beta\) values:

\[ \hat{\beta}_i = \frac{\sum_r P_{ir} \beta_r}{\sum_r P_{ir}}. \]

Panel data

If there are repeated observations for the same individuals, the longitudinal dimension of the data can be taken into account in the mixed logit model, assuming that the random parameters of individual \(i\) are the same for all his choice situations. Denoting \(y_{itl}\) a dummy equal to 1 if \(i\) choose alternative \(l\) for the \(t^{th}\) choice situation, the probability of the observed choice is:

\[P_{it}=\prod_j \frac{\sum_j y_{itj}e^{\beta_i x_{itl}}}{\sum_j e^{\beta_i x_{itj}}}. \]

The joint probability for the \(T\) observations of individual \(i\) is then:

\[P_{i}=\prod_t \prod_j \frac{\sum_jy_{itj}e^{\beta_i x_{itj}}}{\sum_j e^{\beta_i x_{itj}}},\]

and the log-likelihood is simply \(\sum_i \ln P_i\).

Application

The random parameter logit model is estimated by providing a rpar argument to mlogit. This argument is a named vector, the names being the random coefficients and the values the name of the law of distribution. Currently, the normal ("n"), log-normal ("ln"), zero-censored normal ("cn"), uniform ("u") and triangular ("t") distributions are available. For these distributions, two parameters are estimated which are, for normal related distributions, the mean and the standard-deviation of the underlying normal distribution and for the uniform and triangular distribution, the mean and the half range of the distribution. For these last two distributions, zero-bounded variants are also provided ("zbt" and "zbu"). These two distributions are defined by only one parameter (the mean) and their definition domain varies from 0 to twice the mean.

Several considerations may lead to the choice of a specific distribution:

R is the number of draws, halton indicates whether halton draws (see Train 2009, chap. 9) should be used (NA and NULL indicate respectively that default halton draws are used and that pseudo-random numbers are used), panel is a boolean which indicates if the panel data version of the log-likelihood should be used.

Correlations between random parameters can be introduced only for normal-related distributed random parameters, using the correlation argument. If TRUE, all the normal-related random parameters are correlated. The correlation argument can also be a character vector indicating the random parameters that are assumed to be correlated.

Train

We use the Train data set, previously coerced to a dfidx object called Tr. We first estimate the multinomial model: both alternatives being virtual train trips, it is relevant to use only generic coefficients and to remove the intercept:

library("mlogit")
data("Train", package = "mlogit")
Train$choiceid <- 1:nrow(Train)
Tr <- dfidx(Train, choice = "choice", varying = 4:11, sep = "_",
            opposite = c("price", "comfort", "time", "change"),
            idx = list(c("choiceid", "id")), idnames = c("chid", "alt"))
Tr$price <- Tr$price / 100 * 2.20371
Tr$time <- Tr$time / 60
Train.ml <- mlogit(choice ~ price + time + change + comfort | - 1, Tr)
coef(summary(Train.ml))
##           Estimate  Std. Error   z-value     Pr(>|z|)
## price   0.06735804 0.003393252 19.850585 0.000000e+00
## time    1.72055142 0.160351702 10.729861 0.000000e+00
## change  0.32634094 0.059489152  5.485722 4.117843e-08
## comfort 0.94572555 0.064945464 14.561842 0.000000e+00

All the coefficients are highly significant and have the predicted positive sign (remind than an increase in the variable comfort implies using a less comfortable class). The coefficients can’t be directly interpreted, but dividing them by the price coefficient, we get monetary values:

coef(Train.ml)[- 1] / coef(Train.ml)[1]
##     time   change  comfort 
## 25.54337  4.84487 14.04028

We obtain the value of 26 euros for an hour of traveling, 5 euros for a change and 14 euros to travel in a more comfortable class. We then estimate a model with three random parameters, time, change and comfort. We first estimate the uncorrelated mixed logit model:

Train.mxlu <- mlogit(choice ~ price + time + change + comfort | - 1, Tr,
panel = TRUE, rpar = c(time = "n", change = "n", comfort = "n"), R = 100,
correlation = FALSE, halton = NA, method = "bhhh")
names(coef(Train.mxlu))
## [1] "price"      "time"       "change"     "comfort"   
## [5] "sd.time"    "sd.change"  "sd.comfort"

Compared to the multinomial logit model, there are now three more coefficients which are the standard deviations of the distribution of the three random parameters. The correlated model is obtained by setting the correlation argument to TRUE.

Train.mxlc <- update(Train.mxlu, correlation = TRUE)
names(coef(Train.mxlc))
##  [1] "price"                "time"                
##  [3] "change"               "comfort"             
##  [5] "chol.time:time"       "chol.time:change"    
##  [7] "chol.change:change"   "chol.time:comfort"   
##  [9] "chol.change:comfort"  "chol.comfort:comfort"

There are now 6 parameters which are the elements of the Choleski decomposition of the covariance matrix of the three random parameters.

These 6 parameters are therefore the elements of the following matrix

\[ C= \left( \begin{array}{ccc} c_{11} & 0 & 0 \\ c_{12} & c_{22} & 0 \\ c_{13} & c_{23} & c_{33} \end{array} \right) \]

such that:

\[ CC^{\top}= \left( \begin{array}{ccc} c_{11}^2 & c_{11} c_{12} & c_{11}c_{13} \\ c_{11}c_{12} & c_{12}^2 + c_{22}^2 & c_{12}c_{23}+c_{22}c_{23} \\ c_{11}c_{13} & c_{12}c_{3} + c_{22}c_{23} & c_{13}^2 + c_{23}^2 c_{33}^2 \end{array} \right) = \left( \begin{array}{ccc} \sigma_{1}^2 & \sigma_{12} & \sigma_{13} \\ \sigma_{12} & \sigma_{2}^2 & \sigma_{23} \\ \sigma_{13} & \sigma_{23} & \sigma_{3}^2 \end{array} \right) \]

where \(\sigma_i^2\) and \(\sigma_{ij}\) are respectively the variance of the random parameter \(i\) and the covariance between two random parameters \(i\) and \(j\). Therefore, the first estimated parameter can be simply interpreted as the standard deviation of the first random parameter, but the five other can’t be interpreted easily.

Random parameters may be extracted using the function rpar which take as first argument a mlogit object and as second argument par the parameter(s) to be extracted. This function returns a rpar object and a summary method is provided to describe it:

marg.ut.time <- rpar(Train.mxlc, "time")
summary(marg.ut.time)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##     -Inf 1.283749 4.893752 4.893752 8.503756      Inf

The estimated random parameter is in the “preference space”, which means that it is the marginal utility of time.

Note that summary(marg.ut.time) displays the unconditional distribution of the marginal utility of time.

Parameters in the “willingness to pay” (WTP) space are more easy to interpret. They can be estimated directly (a feature not supported by mlogit) or can be obtained from the marginal utility by dividing it by the coefficient of a covariate expressed in monetary value (a price for example), taken as a non random parameter. The ratio can then be interpreted as a monetary value (or willingness to pay). To obtain the distribution of the random parameters in the WTP space, one can use the norm argument of rpar:

wtp.time <- rpar(Train.mxlc, "time", norm = "price")
summary(wtp.time)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##      -Inf  8.753119 33.367588 33.367588 57.982056       Inf

The median value (and the mean value as the distribution is symmetric) of transport time is about 33 euros. Several methods/functions are provided to extract the individual statistics (mean, med and stdev respectively for the mean, the median and the standard deviation):

mean(rpar(Train.mxlc, "time", norm = "price"))
## [1] 33.36759
med(rpar(Train.mxlc, "time", norm = "price"))
## [1] 33.36759
stdev(rpar(Train.mxlc, "time", norm = "price"))
## [1] 36.49347

In case of correlated random parameters, as the estimated parameters can’t be directly interpreted, a vcov method for mlogit objects is provided. It has a what argument which default value is coefficient. In this case the usual covariance matrix of the coefficients is return. If what = "rpar", the covariance matrix of the correlated random parameters is returned if type = "cov" (the default) and the correlation matrix (with standard deviations on the diagonal) is returned if type = "cor". The object is of class vcov.mlogit and a summary method for this object is provided which computes, using the delta method, the standard errors of the parameters of the covariance or the correlation matrix.

vcov(Train.mxlc, what = "rpar")
##               time     change  comfort
## time    28.6460389 -0.2787999 5.557933
## change  -0.2787999  3.1047367 1.232467
## comfort  5.5579334  1.2324667 7.895535
vcov(Train.mxlc, what = "rpar", type = "cor")
##                time      change   comfort
## time     5.35219945 -0.02956296 0.3695645
## change  -0.02956296  1.76202630 0.2489270
## comfort  0.36956453  0.24892701 2.8098994
summary(vcov(Train.mxlc, what = "rpar", type = "cor"))
##                     Estimate Std. Error z-value  Pr(>|z|)    
## sd.time             5.352199   0.381135 14.0428 < 2.2e-16 ***
## sd.change           1.762026   0.144592 12.1862 < 2.2e-16 ***
## sd.comfort          2.809899   0.178295 15.7599 < 2.2e-16 ***
## cor.time:change    -0.029563   0.232414 -0.1272  0.898782    
## cor.time:comfort    0.369565   0.114068  3.2399  0.001196 ** 
## cor.change:comfort  0.248927   0.110321  2.2564  0.024047 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(vcov(Train.mxlc, what = "rpar", type = "cov"))
##                    Estimate Std. Error z-value  Pr(>|z|)    
## var.time           28.64604    4.07982  7.0214 2.197e-12 ***
## var.change          3.10474    0.50955  6.0931 1.107e-09 ***
## var.comfort         7.89553    1.00198  7.8799 3.276e-15 ***
## cov.time:change    -0.27880    0.51550 -0.5408    0.5886    
## cov.time:comfort    5.55793    0.89161  6.2336 4.559e-10 ***
## cov.change:comfort  1.23247    0.30131  4.0903 4.308e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In case of correlated random parameters, as the estimated parameters are not directly interpretable, further functions are provided to analyze the correlation of the coefficients:

cor.mlogit(Train.mxlc)
##                time      change   comfort
## time     1.00000000 -0.02956296 0.3695645
## change  -0.02956296  1.00000000 0.2489270
## comfort  0.36956453  0.24892701 1.0000000
cov.mlogit(Train.mxlc)
##               time     change  comfort
## time    28.6460389 -0.2787999 5.557933
## change  -0.2787999  3.1047367 1.232467
## comfort  5.5579334  1.2324667 7.895535
stdev(Train.mxlc)
##     time   change  comfort 
## 5.352199 1.762026 2.809899

The correlation can be restricted to a subset of random parameters by filling the correlation argument with a character vector indicating the corresponding covariates:

Train.mxlc2 <- update(Train.mxlc, correlation = c("time", "comfort"))
vcov(Train.mxlc2, what = "rpar", type = "cor")
##              time   comfort
## time    5.5726158 0.3909467
## comfort 0.3909467 3.0631462

The presence of random coefficients and their correlation can be investigated using any of the three tests. Actually, three nested models can be considered, a model with no random effects, a model with random but uncorrelated effects and a model with random and correlated effects. We first present the three tests of no correlated random effects:

lr.mxc <- lrtest(Train.mxlc, Train.ml)
wd.mxc <- waldtest(Train.mxlc)
library("car")
lh.mxc <- linearHypothesis(Train.mxlc, c("chol.time:time = 0",
"chol.time:change =  0", "chol.time:comfort = 0", "chol.change:change = 0",
"chol.change:comfort = 0", "chol.comfort:comfort = 0"))
sc.mxc <- scoretest(Train.ml, rpar = c(time = "n", change = "n",
comfort = "n"), R = 100, correlation = TRUE, halton = NA, panel = TRUE)
sapply(list(wald = wd.mxc, lh = lh.mxc, score = sc.mxc, lr = lr.mxc),
statpval)
##            wald      lh   score      lr
## stat    288.287 288.287 208.765 388.057
## p-value   0.000   0.000   0.000   0.000

The hypothesis of no correlated random parameters is strongly rejected. We then present the three tests of no correlation, the existence of random parameters being maintained.

lr.corr <- lrtest(Train.mxlc, Train.mxlu)
lh.corr <- linearHypothesis(Train.mxlc, c("chol.time:change = 0",
"chol.time:comfort = 0", "chol.change:comfort = 0"))
wd.corr <- waldtest(Train.mxlc, correlation = FALSE)
#YC
sc.corr <- scoretest(Train.mxlu, correlation = TRUE)
sapply(list(wald = wd.corr, lh = lh.corr, score = sc.corr, lr = lr.corr),
statpval)
##            wald      lh  score     lr
## stat    103.195 103.195 10.483 42.621
## p-value   0.000   0.000  0.015  0.000

The hypothesis of no correlation is strongly reject with the Wald and the likelihood ratio test, only at the 5% level for the score test.

RiskyTransport

The second example is a study by León and Miguel (2017) who consider a mode-choice model for transit from Freetown’s airport (Sierra-Leone) to downtown. Four alternatives are available: ferry, helicopter, water-taxi and hovercraft. A striking characteristic of their study is that all these alternatives experienced fatal accidents in recent years, so that the fatality risk is non-negligible and differs much from an alternative to another. For example, the probabilities of dying using the water taxi and the helicopter are respectively of 2.55 and 18.41 out of 100,000 passenger-trips. This feature enables the authors to estimate the value of a statistical life. For an individual \(i\), the utility of choosing alternative \(j\) is:

\[ U_{ij}=\beta_{il} (1 - p_j) + \beta_{ic} (c_j + w_i t_j)+\epsilon_{ij}, \]

where \(p_j\) is the probability of dying while using alternative \(j\), \(c_j\) and \(t_j\) the monetary cost and the transport time of alternative \(j\) and \(w_i\) the wage rate of individual \(i\) (which is supposed to be his valuation of transportation time). \(C_{ij} = c_j + w_i t_j\) is therefore the individual specific generalized cost for alternative \(j\). \(\beta_{il}\) and \(\beta_{ic}\) are the (individual specific) marginal utility of surviving and of expense. The value of the statistical life (VSL) is then defined by:

\[ \mbox{VSL}_i = -\frac{\beta_{il}}{\beta_{îc}} = \frac{\Delta C_{ij}}{\Delta (1-p_j)}. \]

The two covariates of interest are cost (the generalized cost in $PPP) and risk (mortality per 100,000 passenger-trips). The risk variable being purely alternative specific, intercepts for the alternatives cannot therefore be estimated. To avoid endogeneity problems, the authors introduce as covariates marks the individuals gave to 5 attributes of the alternatives: comfort, noise level, crowdedness, convenience and transfer location and the “quality” of the clientele. We first estimate a multinomial logit model.

data("RiskyTransport", package = "mlogit")
RT <- dfidx(RiskyTransport, choice = "choice", idx = list(c("chid", "id"), "mode"),
            idnames = c("chid", "alt"))
ml.rt <- mlogit(choice ~ cost + risk  + seats + noise + crowdness +
convloc + clientele | 0, data = RT, weights = weight)

Note the use of the weights argument in order to set weights to the observations, as done in the original study.

coef(ml.rt)[c("risk", "cost")]
##         risk         cost 
## -0.093907630 -0.009540895

The ratio of the coefficients of risk and of cost is r round(coef(ml.rt)['risk'] / coef(ml.rt)['cost'], 2) (hundred of thousands of $), which means that the estimated value of the statistical life is a bit less than one million $. We next consider a mixed logit model. The coefficients of cost and risk are assumed to be random, following a zero-bounded triangular distribution.

mx.rt <- mlogit(choice ~ cost + risk  + seats + noise + crowdness +
convloc + clientele | 0, data = RT, weights = weight,
rpar = c(cost = 'zbt', risk = 'zbt'), R = 100, halton = NA, panel = TRUE)

The results are presented in the following table:

library("texreg")
htmlreg(list('Multinomial logit' = ml.rt, 'Mixed logit' = mx.rt),
digits = 3, float.pos = "hbt", label = "tab:risktr", single.row = TRUE,
caption = "Transportation choices.")
Transportation choices.
Multinomial logit Mixed logit
cost -0.010 (0.001)*** -0.019 (0.001)***
risk -0.094 (0.011)*** -0.103 (0.016)***
seats 0.152 (0.244) 0.108 (0.233)
noise -0.029 (0.265) 0.142 (0.229)
crowdness -0.919 (0.244)*** -0.716 (0.223)**
convloc -0.377 (0.202) -0.150 (0.197)
clientele -0.257 (0.265) -0.331 (0.254)
AIC 3250.747 3177.250
Log Likelihood -1618.374 -1581.625
Num. obs. 1793 1793
p < 0.001, p < 0.01, p < 0.05

Not that the log-likelihood is much larger for the mixed effect logit. Individual-level parameters can be extracted using the fitted method, with the type argument set to parameters.

indpar <- fitted(mx.rt, type = "parameters")
head(indpar)
##        id        cost        risk
## 1 8020605 -0.02096705 -0.10105817
## 2 8260102 -0.01666475 -0.11211057
## 3 8260104 -0.01728864 -0.08302831
## 4 8260106 -0.01494848 -0.07319789
## 5 8260108 -0.01687051 -0.06501337
## 6 8260201 -0.02133116 -0.10285434

We can then compute the VSL for every individual and analyse their distribution, using quantiles and plotting on figure 1 the empirical density of VSL for African and non-African travelers (as done in León and Miguel 2017 Table 4, p.219 and Figure 5, p.223).2

indpar$VSL <- with(indpar, risk / cost * 100)
quantile(indpar$VSL, c(0.025, 0.975))
##      2.5%     97.5% 
##  432.4199 1054.3428
mean(indpar$VSL)
## [1] 608.94

Note that computing the VSL as the ratio of two random parameters which can take zero value can lead to extremely high values if the individual parameter for cost is close to 0.3

max(indpar$cost)
## [1] -0.002924437
max(indpar$VSL)
## [1] 3131.825

This is not the case here as (absolute) minimum value of cost is \(-0.003\) which leads to a maximum value of VSL of $ \(3131\).

library("ggplot2")
RT$id <- RT$id
indpar <- merge(unique(subset(as.data.frame(RT),
                              select = c("id", "african"))),
                indpar)
ggplot(indpar) + geom_density(aes(x = VSL, linetype = african)) + 
    scale_x_continuous(limits = c(200, 1200))

Bibliography

Daly, Andrew, Stephane Hess, and Kenneth Train. 2012. “Assuring Finite Moments for Willingness to Pay in Random Coefficient Models.” Transportation 39 (1): 19–31. doi:10.1007/s11116-011-9331-3.

León, Gianmarco, and Edward Miguel. 2017. “Risky Transportation Choices and the Value of a Statistical Life.” American Economic Journal: Applied Economics 9 (1): 202–28. doi:10.1257/app.20160140.

Train, Kenneth. 2009. Discrete Choice Methods with Simulation. Cambridge University Press. https://EconPapers.repec.org/RePEc:cup:cbooks:9780521766555.


  1. See vignette formula/data.

  2. Note that individual-specific parameters should be interpreted with caution, as they are consistent estimates of individual parameters only if the number of choice situations for every individual is large (see Train 2009, 266).

  3. See Daly, Hess, and Train (2012) for a discussion of the specifications of mixed logit models which assure finite moments of the distribution of willingness to pay.