Apollo is an R package designed for estimation and analysis of choice models (Train, 2008). This package allows estimating Multinomial logit (MNL), Nested logit (NL), cross-nested logit (CNL), exploded logit (EL), ordered logit (OL), Integrated Choice and Latent Variable (ICLV, Ben-Akiva et al. 2002), Multiple Discrete-Continuous Extreme Value (MDCEV, Bhat 2008), nested MDCEV (MDCNEV, Pinjari and Bhat 2010), and Decision Field Theory (DFT, Hancock and Hess 2018) models. All models support both continuous and discrete mixing (e.g. continuous random parameters, latent classes or finite mixtures), both between and within individuals (i.e at the individual and observation level). Different models can be easily combined for joint estimation. The package allows for classical estimation (i.e. maximum likelihood) as well as Bayesian estimation (i.e. hierarchical Bayes, through package RSGHB).
All functionalities are described in the manual, available at www.ApolloChoiceModelling.com. Examples can also be found on the website.
The recommended way to use Apollo is to create a different R script for each model to estimate. These scripts are refer to as model files and contain all the necessary information to estimate the model (except for the data), and can also include post-estimation analysis of the results.
A model file should be structured around the following sections:
Definition of core settings: In this section the Apollo library is loaded, and global variables such as the name of the model are defined.
Data loading: In this section the database is loaded. Any transformation to the data that does not depend on parameters to be estimated (e.g. averages of explanatory variables) should be performed here and stored inside the database object.
Parameter definition: In this section, a vector containing the names and starting values of parameters is defined, as well as details of their mixing distributions if present.
Input validation: In this section, all inputs prepared in the previous sections are validated and stored together.
Likelihood definition: In this section, the user must write a function that calculates the model likelihood.
Model estimation and reporting: In this section, the model is estimated and the results are displayed.
Postprocessing of results: This is an optional section where the user can do forecasting and other post-processing of the estrimated model.
In this document, we present two example model files: a multinomial logit (MNL) model, and a mixed multinomial logit (MMNL) model. More examples, discussed in more detail are available at www.ApolloChoiceModelling.com.
In this section, we present code to estimate an MNL model using the synthetic data included in the package. In the postprocessing, we predict the impact on mode share of a 10% increase in the cost of the train fare. The utility functions of alternatives are defined as follows.
\[U_{nsi} = asc_i + \beta_{tt}tt_{nsi} + \beta_{c}*cost_{nsi} + \varepsilon_{nsi}\]
Where n indexes individuals, s choice scenarios, and i alternatives. \(asc_i\) is the alternative specific constant, \(tt_{nsi}\) is the travel time and \(cost_{nsi}\) is the cost. \(\varepsilon_{nsi}\) is an independent identically distributed standard Gumbel error term. \(asc_i\), \(\beta_{tt}\) and \(\beta_{c}\) are parameters to be estimated.
The likelihood function of this model for individual \(n\) is as follows.
\[L_{n}=\prod_{s}P_{nsi}\]
Where \[P_{nsi}=\frac{e^{V_{nsi}}}{\sum_{j}e^{V_{nsj}}}\] And \(V_{nsi}=U_{nsi}-\varepsilon_{nsi}\), i.e. the deterministic part of the utility.
The likelihood function of the MNL model is pre-coded in Apollo
, so we do not need to type it ourselves. However, if the user prefers to write the likelihood themselves, they can also do it. The pre-coded MNL likelihood function (apollo_mnl
) requires a series of inputs defined inside the mnl_settings
object.
# ####################################################### #
#### 1. Definition of core settings
# ####################################################### #
### Clear memory
rm(list = ls())
### Load libraries
library(apollo)
#> Apollo 0.1.0
#> www.ApolloChoiceModelling.com
#> See url for a detailed manual, examples and a help forum.
#> Sign up to our mailing list for updates on new releases.
### Initialise code
apollo_initialise()
### Set core controls
apollo_control = list(
modelName ="MNL",
modelDescr ="Simple MNL model on mode choice SP data",
indivID ="ID"
)
# ####################################################### #
#### 2. Data loading ####
# ####################################################### #
data("apollo_modeChoiceData")
database = apollo_modeChoiceData
rm(apollo_modeChoiceData)
### Use only SP data
database = subset(database,database$SP==1)
### Create new variable with average income
database$mean_income = mean(database$income)
# ####################################################### #
#### 3. Parameter definition ####
# ####################################################### #
### Vector of parameters, including any that are kept fixed
### during estimation
apollo_beta=c(asc_car = 0,
asc_bus = 0,
asc_air = 0,
asc_rail = 0,
b_tt_car = 0,
b_tt_bus = 0,
b_tt_air = 0,
b_tt_rail= 0,
b_c = 0)
### Vector with names (in quotes) of parameters to be
### kept fixed at their starting value in apollo_beta.
### Use apollo_beta_fixed = c() for no fixed parameters.
apollo_fixed = c("asc_car")
# ####################################################### #
#### 4. Input validation ####
# ####################################################### #
apollo_inputs = apollo_validateInputs()
#> Missing setting for mixing, set to default of FALSE
#> Missing setting for nCores, set to default of 1
#> Missing setting for workInLogs, set to default of FALSE
#> Missing setting for seed, set to default of 13
#> Missing setting for HB, set to default of FALSE
#> Several observations per individual detected based on the value of ID.
#> Setting panelData set to TRUE.
#> All checks on apollo_control completed.
#> All checks on data completed.
# ####################################################### #
#### 5. Likelihood definition ####
# ####################################################### #
apollo_probabilities=function(apollo_beta, apollo_inputs,
functionality="estimate"){
### Attach inputs and detach after function exit
apollo_attach(apollo_beta, apollo_inputs)
on.exit(apollo_detach(apollo_beta, apollo_inputs))
### Create list of probabilities P
P = list()
### List of utilities: these must use the same names as
### in mnl_settings, order is irrelevant.
V = list()
V[['car']] = asc_car + b_tt_car *time_car + b_c*cost_car
V[['bus']] = asc_bus + b_tt_bus *time_bus + b_c*cost_bus
V[['air']] = asc_air + b_tt_air *time_air + b_c*cost_air
V[['rail']]= asc_rail+ b_tt_rail*time_rail+ b_c*cost_rail
### Define settings for MNL model component
mnl_settings = list(
alternatives = c(car=1, bus=2, air=3, rail=4),
avail = list(car=av_car, bus=av_bus,
air=av_air, rail=av_rail),
choiceVar = choice,
V = V
)
### Compute probabilities using MNL model
P[['model']] = apollo_mnl(mnl_settings, functionality)
### Take product across observation for same individual
P = apollo_panelProd(P, apollo_inputs, functionality)
### Prepare and return outputs of function
P = apollo_prepareProb(P, apollo_inputs, functionality)
return(P)
}
# ####################################################### #
#### 6. Model estimation and reporting ####
# ####################################################### #
model = apollo_estimate(apollo_beta, apollo_fixed,
apollo_probabilities,
apollo_inputs)
#> Validating inputs of likelihood function (apollo_probabilities)
#>
#> Overview of choices for model component "MNL"
#> car bus air rail
#> Times available 5446.00 6314.00 5264.00 6118.00
#> Times chosen 1946.00 358.00 1522.00 3174.00
#> Percentage chosen overall 27.80 5.11 21.74 45.34
#> Percentage chosen when available 35.73 5.67 28.91 51.88
#>
#> Testing likelihood function ..........Done.
#>
#>
#> Starting main estimation
#> Initial function value: -8196.021
#> Initial gradient value:
#> asc_bus asc_air asc_rail b_tt_car b_tt_bus b_tt_air
#> -1599.667 -39.000 1302.667 92447.083 -597397.501 -1820.833
#> b_tt_rail b_c
#> 189332.083 8862.500
#> initial value 8196.020532
#> iter 2 value 7456.688617
#> iter 3 value 7375.154805
#> iter 4 value 7260.844459
#> iter 5 value 6670.816191
#> iter 6 value 6618.448416
#> iter 7 value 6471.224867
#> iter 8 value 6429.069520
#> iter 9 value 6216.357930
#> iter 10 value 6052.604904
#> iter 11 value 5838.718110
#> iter 12 value 5815.462115
#> iter 13 value 5804.515622
#> iter 14 value 5802.299705
#> iter 15 value 5802.031103
#> iter 16 value 5802.023289
#> iter 17 value 5802.022828
#> iter 17 value 5802.022826
#> iter 17 value 5802.022826
#> final value 5802.022826
#> converged
#>
#> Estimated parameters:
#> [,1]
#> asc_car 0.0000
#> asc_bus 0.0115
#> asc_air -0.6495
#> asc_rail -1.2357
#> b_tt_car -0.0101
#> b_tt_bus -0.0164
#> b_tt_air -0.0118
#> b_tt_rail -0.0048
#> b_c -0.0529
#>
#> Computing covariance matrix using numDeriv package.
#> (this may take a while)
#> 0%....25%....50%....75%....100%
#> Hessian calculated with numDeriv will be used.
#> Min abs eigenvalue of hessian : 3.304525
#> Calculating LL(0)... Done.
#> Calculating LL of each model component... Done.
apollo_modelOutput(model)
#> Model run using Apollo for R, version 0.1.0
#> www.ApolloChoiceModelling.com
#>
#> Model name : MNL
#> Model description : Simple MNL model on mode choice SP data
#> Model run at : 2020-03-16 11:06:31
#> Estimation method : bfgs
#> Model diagnosis : successful convergence
#> Number of individuals : 500
#> Number of observations : 7000
#>
#> Number of cores used : 1
#> Model without mixing
#>
#> LL(start) : -8196.021
#> LL(0) : -8196.021
#> LL(final) : -5802.023
#> Rho-square (0) : 0.2921
#> Adj.Rho-square (0) : 0.2911
#> AIC : 11620.05
#> BIC : 11674.87
#> Estimated parameters : 8
#> Time taken (hh:mm:ss) : 00:00:3.27
#> Iterations : 19
#> Min abs eigenvalue of hessian : 3.304525
#>
#> Estimates:
#> Estimate Std.err. t.ratio(0) Rob.std.err. Rob.t.ratio(0)
#> asc_car 0.0000 NA NA NA NA
#> asc_bus 0.0115 0.5408 0.02 0.5411 0.02
#> asc_air -0.6495 0.2699 -2.41 0.2666 -2.44
#> asc_rail -1.2357 0.3218 -3.84 0.3125 -3.95
#> b_tt_car -0.0101 0.0006 -15.75 0.0007 -15.30
#> b_tt_bus -0.0164 0.0015 -11.30 0.0015 -11.11
#> b_tt_air -0.0118 0.0024 -4.93 0.0024 -5.00
#> b_tt_rail -0.0048 0.0016 -2.90 0.0016 -2.95
#> b_c -0.0529 0.0014 -37.21 0.0017 -31.14
#>
#> Overview of choices for model component "MNL"
#> car bus air rail
#> Times available 5446.00 6314.00 5264.00 6118.00
#> Times chosen 1946.00 358.00 1522.00 3174.00
#> Percentage chosen overall 27.80 5.11 21.74 45.34
#> Percentage chosen when available 35.73 5.67 28.91 51.88
apollo_saveOutput(model)
#> Estimates saved to MNL_estimates.csv
#> Classical covariance matrix saved to MNL_covar.csv
#> Robust covariance matrix saved to MNL_robcovar.csv
#> Classical correlation matrix saved to MNL_covar.csv
#> Robust correlation matrix saved to MNL_robcorr.csv
#> Model object saved to MNL.rds
# ####################################################### #
#### 7. Postprocessing of results ####
# ####################################################### #
### Use the estimated model to make predictions
predictions_base = apollo_prediction(model,
apollo_probabilities,
apollo_inputs)
#> Updating inputs...Done.
#>
#> Running predictions from model using parameter estimates Done.
#>
#> Predicted demand at model estimates
#> car bus air rail
#> 1946.0001 357.9993 1521.9986 3174.0020
### Now imagine the cost for rail increases by 10%
### and predict again
database$cost_rail = 1.1*database$cost_rail
predictions_new = apollo_prediction(model,
apollo_probabilities,
apollo_inputs)
#> Updating inputs...Done.
#>
#> Running predictions from model using parameter estimates Done.
#>
#> Predicted demand at model estimates
#> car bus air rail
#> 2132.5870 399.3251 1645.7480 2822.3399
### Compare predictions
change=(predictions_new-predictions_base)/predictions_base
### Not interested in chosen alternative now,
### so drop last column
change=change[,-ncol(change)]
### Summary of changes (possible presence of NAs due to
### unavailable alternatives)
summary(change)
#> ID Choice situation car bus
#> Min. :0 Min. :0 Min. :0.0000 Min. :0.0000
#> 1st Qu.:0 1st Qu.:0 1st Qu.:0.0725 1st Qu.:0.0738
#> Median :0 Median :0 Median :0.1168 Median :0.1218
#> Mean :0 Mean :0 Mean :0.1105 Mean :0.1225
#> 3rd Qu.:0 3rd Qu.:0 3rd Qu.:0.1509 3rd Qu.:0.1674
#> Max. :0 Max. :0 Max. :0.2339 Max. :0.4326
#> NA's :1554 NA's :686
#> air rail
#> Min. :0.0000 Min. :-0.3028
#> 1st Qu.:0.0704 1st Qu.:-0.2060
#> Median :0.1100 Median :-0.1340
#> Mean :0.1121 Mean :-0.1434
#> 3rd Qu.:0.1517 3rd Qu.:-0.0723
#> Max. :0.3677 Max. :-0.0022
#> NA's :1736 NA's :882
In this section, we present code to estimate a mixed MNL model (MMNL) using the synthetic data included in the package. After estimation, we predict the effect of a 10% increase in the train fares. The utility function of the model remains the same than in the previous example, i.e.:
\[U_{nsi} = asc_i + \beta_{tt}tt_{nsi} + \beta_{c}*cost_{nsi} + \varepsilon_{nsi}\]
Where n indexes individuals, s choice scenarios, and i alternatives. \(asc_i\) is the alternative specific constant, \(tt_{nsi}\) is the travel time and \(cost_{nsi}\) is the cost. \(\varepsilon_{nsi}\) is an independent identically distributed standard Gumbel error term. \(\beta_{tt}\) follows a log-normal distribution with the underlying normal having a mean \(\mu_{tt}\) and standard deviation \(\sigma_{tt}\). \(asc_i\), \(\beta_{c}\), \(\mu_{tt}\) and \(\sigma_{tt}\) are parameters to be estimated.
The likelihood function of this model for individual \(n\) is as follows.
\[L_{n}=\int_{\beta_{tt}}\prod_{s}P_{nsi}f(\beta_{tt})d\beta_{tt}\]
Where \(P_{nsi}=\frac{e^{V_{nsi}}}{\sum_{j}e^{V_{nsj}}}\), \(V_{nsi}=U_{nsi}-\varepsilon_{nsi}\), and \(f\) is the probability density function of \(\beta_{tt}\). As this function does not have an analytical closed form, we estimate it using Monte Carlo integration, i.e.:
\[L_{n}\approx\frac{1}{R}\sum_{\beta_{tt}^r}\prod_{s}P_{nsi}^r\] Where \(P_{nsi}^r=P_{nsi}(\beta_{tt}^r)\), with \(\beta_{tt}^r\) a random draw of \(\beta_{tt}\) from its distribution \(f\), and R is a big number.
The code is very similar to the previous example, with only sections 1 and 3 changing. * In section 1 we set mixing = TRUE
inside apollo_control, and we set nCores = 2
to speed up estimation by using two computing threads (this is not mandatory). * In section 3 we define the mean (b_tt_mu
) and standar deviation (b_tt_sigma
) of the underlying normal distribution. We then define the type, name and number of draws used. Finally, we construct the random coefficient \(\beta_{tt}\) inside a function called apollo_randCoeff
. We use 500 inter-individual draws that come from a standard normal distribution, which we later transform into log-normals inside apollo_randCoeff
.
Even though in this case we only use inter-individual draws, note that inter and intra-individuals draws can be used simultaneously. Inter-individual draws capture variability between individuals, while intra-individual draws capture variability within individuals. In terms of the Monte Carlo integration, inter-individual draws are common for all observations from the same individual, while intra-individual draws are different for each observations. In terms of the likelihood function, the use of intra-individual draws would lead to \(L_{n}\approx\prod_{s}\frac{1}{R}\sum_{\beta_{tt}^r}P_{nsi}\), which is not the case in this model.
Estimation of models with mixing is computationally more demanding than models without mixing. Furthermore, using both inter and intra-individual requires large amounts of memory, which can further slow the estimation process. For this reason, this example is not run automatically. Yet, the users may copy and paste the code in a script, and run it themselves.
# ####################################################### #
#### 1. Definition of core settings
# ####################################################### #
### Clear memory
rm(list = ls())
### Load libraries
library(apollo)
### Initialise code
apollo_initialise()
### Set core controls
apollo_control = list(
modelName ="MMNL",
modelDescr ="Simple MMNL model on mode choice SP data",
indivID ="ID",
mixing = TRUE,
nCores = 2
)
# ####################################################### #
#### 2. Data loading ####
# ####################################################### #
data("apollo_modeChoiceData")
database = apollo_modeChoiceData
rm(apollo_modeChoiceData)
### Use only SP data
database = subset(database,database$SP==1)
### Create new variable with average income
database$mean_income = mean(database$income)
# ####################################################### #
#### 3. Parameter definition ####
# ####################################################### #
### Vector of parameters, including any that are kept fixed
### during estimation
apollo_beta=c(asc_car = 0,
asc_bus = 0,
asc_air = 0,
asc_rail = 0,
mu_tt = 0,
sigma_tt = 1,
b_c = 0)
### Vector with names (in quotes) of parameters to be
### kept fixed at their starting value in apollo_beta.
### Use apollo_beta_fixed = c() for no fixed parameters.
apollo_fixed = c("asc_car")
### Set parameters for generating draws
apollo_draws = list(
interDrawsType = "halton",
interNDraws = 500,
interUnifDraws = c(),
interNormDraws = c("draws_tt"),
intraDrawsType = "halton",
intraNDraws = 0,
intraUnifDraws = c(),
intraNormDraws = c()
)
### Create random parameters
apollo_randCoeff = function(apollo_beta, apollo_inputs){
randcoeff = list()
randcoeff[["b_tt"]] = -exp(mu_tt + sigma_tt*draws_tt)
return(randcoeff)
}
# ####################################################### #
#### 4. Input validation ####
# ####################################################### #
apollo_inputs = apollo_validateInputs()
# ####################################################### #
#### 5. Likelihood definition ####
# ####################################################### #
apollo_probabilities=function(apollo_beta, apollo_inputs,
functionality="estimate"){
### Attach inputs and detach after function exit
apollo_attach(apollo_beta, apollo_inputs)
on.exit(apollo_detach(apollo_beta, apollo_inputs))
### Create list of probabilities P
P = list()
### List of utilities: these must use the same names as
### in mnl_settings, order is irrelevant.
V = list()
V[['car']] = asc_car + b_tt*time_car + b_c*cost_car
V[['bus']] = asc_bus + b_tt*time_bus + b_c*cost_bus
V[['air']] = asc_air + b_tt*time_air + b_c*cost_air
V[['rail']] = asc_rail + b_tt*time_rail + b_c*cost_rail
### Define settings for MNL model component
mnl_settings = list(
alternatives = c(car=1, bus=2, air=3, rail=4),
avail = list(car=av_car, bus=av_bus,
air=av_air, rail=av_rail),
choiceVar = choice,
V = V
)
### Compute probabilities using MNL model
P[['model']] = apollo_mnl(mnl_settings, functionality)
### Take product across observation for same individual
P = apollo_panelProd(P, apollo_inputs, functionality)
### Average draws
P = apollo_avgInterDraws(P, apollo_inputs, functionality)
### Prepare and return outputs of function
P = apollo_prepareProb(P, apollo_inputs, functionality)
return(P)
}
# ####################################################### #
#### 6. Model estimation and reporting ####
# ####################################################### #
model = apollo_estimate(apollo_beta, apollo_fixed,
apollo_probabilities,
apollo_inputs)
apollo_modelOutput(model)
apollo_saveOutput(model)
# ####################################################### #
#### 7. Postprocessing of results ####
# ####################################################### #
### Use the estimated model to make predictions
predictions_base = apollo_prediction(model,
apollo_probabilities,
apollo_inputs)
### Now imagine the cost for rail increases by 10%
### and predict again
database$cost_rail = 1.1*database$cost_rail
predictions_new = apollo_prediction(model,
apollo_probabilities,
apollo_inputs)
### Compare predictions
change=(predictions_new-predictions_base)/predictions_base
### Not interested in chosen alternative now,
### so drop last column
change=change[,-ncol(change)]
### Summary of changes (possible presence of NAs due to
### unavailable alternatives)
summary(change)