DirichletReg – Article Sources

Marco J. Maier

2020-05-28

Full R-Code for:

Maier, M. J. (2014). DirichletReg: Dirichlet Regression for Compositional Data in R. Research Report Series/Department of Statistics and Mathematics, 125. WU Vienna University of Economics and Business, Vienna. http://epub.wu.ac.at/4077/

4. Application examples

4.1. The Arctic lake (common parametrization)

Code for Fig. 1 (left):

Figure 1: Arctic lake: Ternary plot and depth vs. composition. (left)

Figure 1: Arctic lake: Ternary plot and depth vs. composition. (left)

Code for Fig. 1 (right):

Figure 1: Arctic lake: Ternary plot and depth vs. composition. (right)

Figure 1: Arctic lake: Ternary plot and depth vs. composition. (right)

lake1 <- DirichReg(AL ~ depth, ArcticLake)
lake1
## Call:
## DirichReg(formula = AL ~ depth, data = ArcticLake)
## using the common parametrization
## 
## Log-likelihood: 101.4 on 6 df (104 BFGS + 1 NR Iterations)
## 
## -----------------------------------------
## Coefficients for variable no. 1: sand
## (Intercept)        depth  
##     0.11662      0.02335  
## -----------------------------------------
## Coefficients for variable no. 2: silt
## (Intercept)        depth  
##    -0.31060      0.05557  
## -----------------------------------------
## Coefficients for variable no. 3: clay
## (Intercept)        depth  
##     -1.1520       0.0643  
## -----------------------------------------

coef(lake1)
## $sand
## (Intercept)       depth 
##  0.11662480  0.02335114 
## 
## $silt
## (Intercept)       depth 
## -0.31059591  0.05556745 
## 
## $clay
## (Intercept)       depth 
## -1.15195642  0.06430175

lake2 <- update(lake1, . ~ . + I(depth^2) | . + I(depth^2) | . + I(depth^2))
anova(lake1, lake2)
## Analysis of Deviance Table
## 
## Model 1: DirichReg(formula = AL ~ depth, data = ArcticLake)
## Model 2: DirichReg(formula = AL ~ depth + I(depth^2) | depth + I(depth^2) |
##   depth + I(depth^2), data = ArcticLake)
## 
##         Deviance N. par Difference df Pr(>Chi)   
## Model 1  -202.74      6                          
## Model 2  -217.99      9     15.254  3 0.001612 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(lake2)
## Call:
## DirichReg(formula = AL ~ depth + I(depth^2) | depth + I(depth^2) | depth +
## I(depth^2), data = ArcticLake)
## 
## Standardized Residuals:
##           Min       1Q   Median      3Q     Max
## sand  -1.7647  -0.7080  -0.1786  0.9598  3.0460
## silt  -1.1379  -0.5330  -0.1546  0.2788  1.5604
## clay  -1.7661  -0.6583  -0.0454  0.6584  2.0152
## 
## ------------------------------------------------------------------
## Beta-Coefficients for variable no. 1: sand
##               Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  1.4361967  0.8026814   1.789   0.0736 .
## depth       -0.0072382  0.0329433  -0.220   0.8261  
## I(depth^2)   0.0001324  0.0002761   0.480   0.6315  
## ------------------------------------------------------------------
## Beta-Coefficients for variable no. 2: silt
##               Estimate Std. Error z value Pr(>|z|)  
## (Intercept) -0.0259705  0.7598827  -0.034   0.9727  
## depth        0.0717450  0.0343089   2.091   0.0365 *
## I(depth^2)  -0.0002679  0.0003088  -0.867   0.3857  
## ------------------------------------------------------------------
## Beta-Coefficients for variable no. 3: clay
##               Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -1.7931487  0.7362293  -2.436  0.01487 * 
## depth        0.1107906  0.0357705   3.097  0.00195 **
## I(depth^2)  -0.0004872  0.0003308  -1.473  0.14079   
## ------------------------------------------------------------------
## Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-likelihood: 109 on 9 df (162 BFGS + 2 NR Iterations)
## AIC: -200, BIC: -185
## Number of Observations: 39
## Link: Log
## Parametrization: common

Code for Fig. 2:

Figure 2: Arctic lake: Fitted values of the quadratic model.

Figure 2: Arctic lake: Fitted values of the quadratic model.

Code for Fig. 3:

Figure 3: Arctic lake: OLS (dashed) vs. Dirichlet regression (solid) predictions.

Figure 3: Arctic lake: OLS (dashed) vs. Dirichlet regression (solid) predictions.

4.2. Blood samples (alternative parametrization)

Bld <- BloodSamples
Bld$Smp <- DR_data(Bld[, 1:4])
## Warning in DR_data(Bld[, 1:4]): not all rows sum up to 1 => normalization forced

blood1 <- DirichReg(Smp ~ Disease | 1, Bld, model = "alternative", base = 3)
blood2 <- DirichReg(Smp ~ Disease | Disease, Bld, model = "alternative", base = 3)
anova(blood1, blood2)
## Analysis of Deviance Table
## 
## Model 1: DirichReg(formula = Smp ~ Disease | 1, data = Bld, model =
##   "alternative", base = 3)
## Model 2: DirichReg(formula = Smp ~ Disease | Disease, data = Bld, model =
##   "alternative", base = 3)
## 
##         Deviance N. par Difference df Pr(>Chi)
## Model 1  -303.86      7                       
## Model 2  -304.61      8     0.7587  1   0.3837

summary(blood1)
## Call:
## DirichReg(formula = Smp ~ Disease | 1, data = Bld, model = "alternative", base
## = 3)
## 
## Standardized Residuals:
##                  Min       1Q   Median      3Q     Max
## Albumin      -2.1310  -0.9307  -0.1234  0.8149  2.8429
## Pre.Albumin  -1.0687  -0.4054  -0.0789  0.1947  1.5691
## Globulin.A   -2.0503  -1.0392   0.1938  0.7927  2.2393
## Globulin.B   -1.8176  -0.5347   0.1488  0.5115  1.3284
## 
## MEAN MODELS:
## ------------------------------------------------------------------
## Coefficients for variable no. 1: Albumin
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.11639    0.09935  11.237   <2e-16 ***
## DiseaseB    -0.07002    0.13604  -0.515    0.607    
## ------------------------------------------------------------------
## Coefficients for variable no. 2: Pre.Albumin
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   0.5490     0.1082   5.076 3.86e-07 ***
## DiseaseB     -0.1276     0.1493  -0.855    0.393    
## ------------------------------------------------------------------
## Coefficients for variable no. 3: Globulin.A
## - variable omitted (reference category) -
## ------------------------------------------------------------------
## Coefficients for variable no. 4: Globulin.B
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   0.4863     0.1094   4.445  8.8e-06 ***
## DiseaseB      0.1819     0.1472   1.236    0.216    
## ------------------------------------------------------------------
## 
## PRECISION MODEL:
## ------------------------------------------------------------------
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   4.2227     0.1475   28.64   <2e-16 ***
## ------------------------------------------------------------------
## Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-likelihood: 151.9 on 7 df (44 BFGS + 1 NR Iterations)
## AIC: -289.9, BIC: -280
## Number of Observations: 30
## Links: Logit (Means) and Log (Precision)
## Parametrization: alternative

Code for Fig.~:

Blood samples: Box plots and fitted values (dashed lines indicate the fitted values for each group).

Blood samples: Box plots and fitted values (dashed lines indicate the fitted values for each group).

Code for Fig.~:

Blood samples: Observed values and predictions

Blood samples: Observed values and predictions

4.3. Reading skills data (alternative parametrization)

Code for Fig.~:

g.ind <- as.numeric(RS$dyslexia)
g1 <- g.ind == 1  # normal
g2 <- g.ind != 1  # dyslexia
par(mar = c(4, 4, 4, 4) + 0.25)
plot(accuracy ~ iq, RS, pch = 21, bg = c("#E495A5", "#39BEB1")[3 - g.ind], cex = 1.5, 
    main = "Dyslexic (Red) vs. Control (Green) Group", xlab = "IQ Score", ylab = "Reading Accuracy", 
    xlim = range(ReadingSkills$iq))

x1 <- seq(min(RS$iq[g1]), max(RS$iq[g1]), length.out = 200)
x2 <- seq(min(RS$iq[g2]), max(RS$iq[g2]), length.out = 200)
n <- length(x1)
X <- data.frame(dyslexia = factor(rep(0:1, each = n), levels = 0:1, labels = c("no", 
    "yes")), iq = c(x1, x2))
pv <- predict(rs2, X, TRUE, TRUE, TRUE)

lines(x1, pv$mu[1:n, 2], col = c("#E495A5", "#39BEB1")[2], lwd = 3)
lines(x2, pv$mu[(n + 1):(2 * n), 2], col = c("#E495A5", "#39BEB1")[1], lwd = 3)

a <- RS$accuracy
logRa_a <- log(a/(1 - a))
rlr <- lm(logRa_a ~ dyslexia * iq, RS)
ols <- 1/(1 + exp(-predict(rlr, X)))
lines(x1, ols[1:n], col = c("#AD6071", "#00897D")[2], lwd = 3, lty = 2)
lines(x2, ols[(n + 1):(2 * n)], col = c("#AD6071", "#00897D")[1], lwd = 3, lty = 2)

### precision plot
par(new = TRUE)
plot(x1, pv$phi[1:n], col = c("#6E1D34", "#004E42")[2], lty = "11", type = "l", ylim = c(0, 
    max(pv$phi)), axes = F, ann = F, lwd = 2, xlim = range(RS$iq))
lines(x2, pv$phi[(n + 1):(2 * n)], col = c("#6E1D34", "#004E42")[1], lty = "11", 
    type = "l", lwd = 2)
axis(4)
mtext(expression(paste("Precision (", phi, ")", sep = "")), 4, line = 3)
legend("topleft", legend = c(expression(hat(mu)), expression(hat(phi)), "OLS"), lty = c(1, 
    3, 2), lwd = c(3, 2, 3), bty = "n")
Reading skills: Predicted values of Dirichlet regression and OLS regression.

Reading skills: Predicted values of Dirichlet regression and OLS regression.

a <- RS$accuracy
logRa_a <- log(a/(1 - a))
rlr <- lm(logRa_a ~ dyslexia * iq, RS)
summary(rlr)
## 
## Call:
## lm(formula = logRa_a ~ dyslexia * iq, data = RS)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.66405 -0.37966  0.03687  0.40887  2.50345 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      2.8067     0.2822   9.944 2.27e-12 ***
## dyslexiayes     -2.4113     0.4517  -5.338 4.01e-06 ***
## iq               0.7823     0.2992   2.615   0.0125 *  
## dyslexiayes:iq  -0.8457     0.4510  -1.875   0.0681 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.2 on 40 degrees of freedom
## Multiple R-squared:  0.6151, Adjusted R-squared:  0.5862 
## F-statistic: 21.31 on 3 and 40 DF,  p-value: 2.083e-08

summary(rs2)
## Call:
## DirichReg(formula = acc ~ dyslexia * iq | dyslexia + iq, data = RS, model =
## "alternative")
## 
## Standardized Residuals:
##                   Min       1Q   Median      3Q     Max
## 1 - accuracy  -1.5661  -0.8204  -0.5112  0.5211  3.4334
## accuracy      -3.4334  -0.5211   0.5112  0.8204  1.5661
## 
## MEAN MODELS:
## ------------------------------------------------------------------
## Coefficients for variable no. 1: 1 - accuracy
## - variable omitted (reference category) -
## ------------------------------------------------------------------
## Coefficients for variable no. 2: accuracy
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      1.8649     0.2991   6.235 4.52e-10 ***
## dyslexiayes     -1.4833     0.3029  -4.897 9.74e-07 ***
## iq               1.0676     0.3359   3.178 0.001482 ** 
## dyslexiayes:iq  -1.1625     0.3452  -3.368 0.000757 ***
## ------------------------------------------------------------------
## 
## PRECISION MODEL:
## ------------------------------------------------------------------
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.5579     0.3336   4.670 3.01e-06 ***
## dyslexiayes   3.4931     0.5880   5.941 2.83e-09 ***
## iq            1.2291     0.4596   2.674  0.00749 ** 
## ------------------------------------------------------------------
## Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-likelihood: 65.9 on 7 df (56 BFGS + 2 NR Iterations)
## AIC: -117.8, BIC: -105.3
## Number of Observations: 44
## Links: Logit (Means) and Log (Precision)
## Parametrization: alternative

confint(rs2)
## 
## 95% Confidence Intervals (original form)
## 
## - Beta-Parameters:
## Variable: 1 - accuracy
##   variable omitted
## 
## Variable: accuracy
##                   2.5%   Est.   97.5%
## (Intercept)      1.279   1.86   2.451
## dyslexiayes     -2.077  -1.48  -0.890
## iq               0.409   1.07   1.726
## dyslexiayes:iq  -1.839  -1.16  -0.486
## 
## - Gamma-Parameters
##               2.5%  Est.  97.5%
## (Intercept)  0.904  1.56   2.21
## dyslexiayes  2.341  3.49   4.65
## iq           0.328  1.23   2.13

confint(rs2, exp = TRUE)
## 
## 95% Confidence Intervals (exponentiated)
## 
## - Beta-Parameters:
## Variable: 1 - accuracy
##   variable omitted
## 
## Variable: accuracy
##                  2.5%  exp(Est.)   97.5%
## (Intercept)     3.592      6.455  11.601
## dyslexiayes     0.125      0.227   0.411
## iq              1.506      2.908   5.618
## dyslexiayes:iq  0.159      0.313   0.615
## 
## - Gamma-Parameters
##               2.5%  exp(Est.)   97.5%
## (Intercept)   2.47       4.75    9.13
## dyslexiayes  10.39      32.89  104.12
## iq            1.39       3.42    8.41

Code for Fig.~:

Reading skills: residual plots of OLS and Dirichlet regression models.

Reading skills: residual plots of OLS and Dirichlet regression models.