conquer

Convolution-type smoothed quantile regression

Description

The conquer library performs fast and accurate convolution-type smoothed quantile regression (Fernandes, Guerre and Horta, 2019) implemented via Barzilai-Borwein gradient descent (Barzilai and Borwein, 1988) with a Huber regression warm start. The package can also Construct confidence intervals for regression coefficients using multiplier bootstrap.

Installation

conquer is available on CRAN, and it can be installed into R environment:

install.packages("conquer")

Main function

The main functions of this library:

Examples

Let us illustrate conquer by a simple example. For sample size n = 5000 and dimension p = 70, we generate data from a linear model yi = β0 + <xi, β> + εi, for i = 1, 2, … n. Here we set β0 = 1, β is a p-dimensional vector with every entry being 1, xi follows p-dimensional standard multivariate normal distribution (available in the library MASS), and εi is from t2 distribution.

library(MASS)
library(quantreg)
library(conquer)
n = 5000
p = 70
beta = rep(1, p + 1)
set.seed(2020)
X = mvrnorm(n, rep(0, p), diag(p))
err = rt(n, 2)
Y = cbind(1, X) %*% beta + err

Then we run both quantile regression using package quantreg, with a Frisch-Newton approach after preprocessing (Portnoy and Koenker, 1997), and conquer (with Gaussian kernel) on the generated data. The quantile level τ is fixed to be 0.5.

tau = 0.5
start = Sys.time()
fit.qr = rq(Y ~ X, tau = tau, method = "pfn")
end = Sys.time()
time.qr = as.numeric(difftime(end, start, units = "secs"))
est.qr = norm(as.numeric(fit.qr$coefficients) - beta, "2")

start = Sys.time()
fit.conquer = conquer(X, Y, tau = tau)
end = Sys.time()
time.conquer = as.numeric(difftime(end, start, units = "secs"))
est.conquer = norm(fit.conquer$coeff - beta, "2")

It takes 0.1955 seconds to run the standard quantile regression but only 0.0255 seconds to run conquer. In the meanwhile, the estimation error is 0.1799 for quantile regression and 0.1685 for conquer. For readers’ reference, these runtimes are recorded on a Macbook Pro with 2.3 GHz 8-Core Intel Core i9 processor, and 16 GB 2667 MHz DDR4 memory.

Getting help

Help on the functions can be accessed by typing ?, followed by function name at the R command prompt.

For example, ?conquer will present a detailed documentation with inputs, outputs and examples of the function conquer.

License

GPL-3.0

System requirements

C++11

Authors

Xuming He xmhe@umich.edu, Xiaoou Pan xip024@ucsd.edu, Kean Ming Tan keanming@umich.edu and Wen-Xin Zhou wez243@ucsd.edu

Maintainer

Xiaoou Pan xip024@ucsd.edu

References

Barzilai, J. and Borwein, J. M. (1988). Two-point step size gradient methods. IMA J. Numer. Anal. 8 141–148. Paper

Fernandes, M., Guerre, E. and Horta, E. (2019). Smoothing quantile regressions. J. Bus. Econ. Statist., in press. Paper

He, X., Pan, X., Tan, K. M., and Zhou, W.-X. (2020). Smoothed quantile regression for large-scale inference. Preprint.

Horowitz, J. L. (1998). Bootstrap methods for median regression models. Econometrica 66 1327–1351. Paper

Koenker, R. (2005). Quantile Regression. Cambridge Univ. Press, Cambridge. Book

Koenker, R. (2019). Package “quantreg”, version 5.54. CRAN

Koenker, R. and Bassett, G. (1978). Regression quantiles. Econometrica 46 33-50. Paper

Portnoy, S. and Koenker, R. (1997). The Gaussian hare and the Laplacian tortoise: Computability of squared-error versus absolute-error estimators. Statist. Sci. 12 279–300. Paper

Sanderson, C. and Curtin, R. (2016). Armadillo: A template-based C++ library for linear algebra. J. Open Source Softw. 1 26. Paper