Coefficient of Variation: cv_versatile

Maani Beigy

February 18, 2019

Coefficient of Variation

Coefficient of variation (\(CV\)) is a measure of relative dispersion representing the degree of variability relative to the mean (Albatineh, Kibria, Wilcox, & Zogheib, 2014). Since cv is unitless, it is useful for comparison of variables with different units (Albatineh et al., 2014). It is also a measure of homogeneity. The population coefficient of variation is:
\[CV = \frac{\sigma}{\mu},\] where \(\sigma\) is the population standard deviation and \(\mu\) is the population mean. Almost always, we analyze data from samples but want to generalize it as the population’s parameter (Albatineh et al., 2014). Its sample’s estimate is given as:
\[cv = \frac{sd}{\bar{X}}\]
where \(sd\) is the sample standard deviation, the square root of the unbiased estimator of population variance, and \(\bar{X}\) is the sample mean. The corrected cv to account for the sample size is: \[ cv_{corr} = cv * \biggl(1 - \frac{1}{4(n-1)} + \frac{1}{n}cv^2 + \frac{1}{2 (n-1)^2} \biggr) \] There are various methods for the calculation of confidence intervals (CI) for cv. All of them are fruitful and have particular use cases. Some of them are model-based hence their usage depends the assumptions regarding the distribution of data. For sake of versatility, we cover almost all of these methods in cvcqv package. Here, we explain them along with some examples:

Kelley Confidence Interval

Let us assume that CV follows a noncentral t distribution, when the parent population of the scores is normally-distributed, with noncentrality (\(\lambda\)) parameter:
\[ \lambda = \frac{\sqrt{n}}{cv} \] with v degrees of freedom, where \(v = n - 1\). Let \(1 - \alpha\) be the CI coverage with \(\alpha_L + \alpha_U = \alpha\) in which \(\alpha_L\) is the the proportion of times that cv will be less than the lower confidence bound and \(\alpha_U\) the proportion of times that cv will be greater than the upper confidence bound in the CI procedure (Kelley, 2007). The lower confidence tile for \(\lambda\) is is the noncentrality parameter that results in \(t_{(1-\alpha_L,v,\lambda_L)}=\hat{\lambda}\) and the upper confidence tile for \(\lambda\) is is the noncentrality parameter that results in \(t_{(\alpha_U,v,\lambda_U)}=\hat{\lambda}\), where \(t_{(1-\alpha_L,v,\lambda_L)}=\hat{\lambda}\) is the value of noncentral t distribution at the \(1-\alpha_L\) quantile with noncentrality parameter \(\lambda_L\) and \(t_{(\alpha_U,v,\lambda_U)}=\hat{\lambda}\) is the value of noncentral t distribution at the \(\alpha_U\) quantile with noncentrality parameter \(\lambda_U\), respectively (Kelley, 2007).
Afterwards, we transform the tiles of the confidence interval for \(\lambda\), by dividing the tiles by \(\sqrt{n}\) and thereafter inverting them; the CI limits of \(cv\) will be obtained:
\[ p\left[\biggl(\frac{\lambda_U}{\sqrt{n}}\biggr)^{-1} \le CV \le \biggl(\frac{\lambda_L}{\sqrt{n}}\biggr)^{-1}\right] = 1-\alpha \] where \(p\) stands for probability. Thanks to package MBESS (Kelley, 2018) for the computation of confidence limits for the noncentrality parameter from a t distribution (conf.limits.nct), \(cv\) will be obtained as:

## $method
## [1] "Corrected cv with Kelley 95% CI"
## 
## $statistics
##      est  lower upper
##   58.058 41.467 98.51

McKay Confidence Interval

McKay (McKay, 1932) introduced the following CI for \(cv\); considering \(u_1 = \chi_{v,1-\alpha/2}^2\) and \(u_1 = \chi_{v,\alpha/2}^2\) being the \(100(1-\alpha/2)\%\) and \(100(\alpha/2)\%\) percentile of the \(\chi^2\) distribution with \(v = n-1\) degrees of freedom, respectively (Albatineh et al., 2014):
\[ \biggl(cv\left[\biggl(\frac{u_1}{v}-1\biggr)(cv)^{2}+\frac{u_1}{v}\right]^{-1/2} \le CV \le cv \left[\biggl(\frac{u_2}{v}-1\biggr)(cv)^{2}+\frac{u_2}{v}\right]^{-1/2}\biggr) \] Let us calculate the 95% CI for our variable \(x\) according to McKay’s method (McKay, 1932):

## $method
## [1] "Corrected cv with McKay 95% CI"
## 
## $statistics
##      est  lower   upper
##   58.058 41.622 109.367

Miller Confidence Interval

Miller (Edward Miller, 1991) introduced the following CI for \(cv\); considering \(Z_{\alpha/2}\) being the \((1-\alpha/2)\) percentile of the standard normal distribution (Albatineh et al., 2014): \[ \biggl(cv - Z_{\alpha/2}\sqrt{ \biggl(\frac{cv^2}{v}\biggr)\biggl(\frac{1}{2}+cv^2\biggr)} \le CV \le cv + Z_{\alpha/2}\sqrt{ \biggl(\frac{cv^2}{v}\biggr)\biggl(\frac{1}{2}+cv^2\biggr)} \biggr) \] where \(v = n-1\) is the degree of freedom.
Let us calculate the 95% CI for \(x\) according to Miller’s method (Edward Miller, 1991):

## $method
## [1] "Corrected cv with Miller 95% CI"
## 
## $statistics
##      est  lower  upper
##   58.058 34.173 81.942

Vangel Confidence Interval

Vangel (Vangel, 1996) proposed the following CI for \(cv\); which is a modification on McKay’s CI: \[ \biggl(cv\left[\biggl(\frac{u_1+1}{v}-1\biggr)(cv)^{2}+\frac{u_1}{v}\right]^{-1/2} \le CV \le cv \left[\biggl(\frac{u_2+1}{v}-1\biggr)(cv)^{2}+\frac{u_2}{v}\right]^{-1/2}\biggr) \] Let us calculate the 95% CI for \(x\) according to Vangel’s method (Vangel, 1996):

## $method
## [1] "Corrected cv with Vangel 95% CI"
## 
## $statistics
##      est  lower   upper
##   58.058 41.443 106.237

Mahmoudvand-Hassani Confidence Interval

Mahmoudvand and Hassani (Mahmoudvand & Hassani, 2009) proposed the following CI for \(cv\); which is obtained using ranked set sampling (RSS):
\[ \biggl(\frac{cv}{2-C_n+Z_{1-\alpha/2}\sqrt{1-C_n^2}} \le CV \le \frac{cv}{2-C_n-Z_{1-\alpha/2}\sqrt{1-C_n^2}} \biggr) \] where \[ C_n=\sqrt{\frac{2}{n-1}}\frac{\Gamma{(n/2)}}{\Gamma{((n-1)/2)}}, \Gamma(n)=(n-1)! \] Let us now calculate the 95% CI for \(x\) according to Mahmoudvand-Hassani’s method (Mahmoudvand & Hassani, 2009):

## $method
## [1] "Corrected cv with Mahmoudvand-Hassani 95% CI"
## 
## $statistics
##      est lower  upper
##   58.058 43.69 83.264

Normal Approximation Confidence Interval

Wararit Panichkitkosolkul (Panichkitkosolkul, 2013) proposed the following CI for \(cv\); which is a normal approximation: \[ \biggl(\frac{cv}{C_{n+1}+Z_{1-\alpha/2}\sqrt{1-C_{n+1}^2}} \le CV \le \frac{cv}{C_{n+1}-Z_{1-\alpha/2}\sqrt{1-C_{n+1}^2}} \biggr) \] where \(C_{n+1}=\sqrt{1-(1/2n)}\)
Now we calculate the normal approximation 95% CI for \(x\) according to Panichkitkosolkul (Panichkitkosolkul, 2013):

## $method
## [1] "Corrected cv with Normal Approximation 95% CI"
## 
## $statistics
##      est  lower  upper
##   58.058 44.752 85.691

Shortest-Length Confidence Interval

Panichkitkosolkul (Panichkitkosolkul, 2013) has also introduced the following CI for \(cv\):
\[ \biggl(\frac{cv\sqrt{v}}{\sqrt{b}} \le CV \le \frac{cv\sqrt{v}}{\sqrt{a}} \biggr) \] with \(v = n-1\) degrees of freedom. Then, shortest-length 95% CI for \(x\) is:

## $method
## [1] "Corrected cv with Shortest-Length 95% CI"
## 
## $statistics
##      est  lower  upper
##   58.058 42.221 81.411

Equal-Tailed Confidence Interval

The \(100(1-\alpha)\%\) equal-tailed CI for \(cv\) can be obtained as: \[ \biggl(\frac{cv\sqrt{v}}{\sqrt{\chi_{v,1-\alpha/2}^2}} \le CV \le \frac{cv\sqrt{v}}{\sqrt{\chi_{v,\alpha/2}^2}} \biggr) \] where \(\chi_{v,\alpha/2}^2\) and \(\chi_{v,1-\alpha/2}^2\) are the \(100(\alpha/2)\) and \(100(1-\alpha/2)\) percentiles of the central \(\chi^2\) distribution with \(v\) degrees of freedom, respectively (Panichkitkosolkul, 2013).
Then, equal-tailed 95% CI for \(x\) is:

## $method
## [1] "Corrected cv with Equal-Tailed 95% CI"
## 
## $statistics
##      est  lower  upper
##   58.058 44.152 84.797

Bootstrap Confidence Intervals

Thanks to package boot (Canty & Ripley, 2017) we can obtain bootstrap CI around \(cv\):

## $method
## [1] "Corrected cv with Basic Bootstrap 95% CI"
## 
## $statistics
##      est  lower  upper
##   58.058 37.866 78.014

All Available Methods

In conclusion, we can observe CIs calculated by all available methods:

## $method
## [1] "All methods"
## 
## $statistics
##                         est  lower   upper
## kelley               57.774 41.467  98.510
## mckay                57.774 41.441 108.482
## miller               57.774 34.053  81.494
## vangel               57.774 41.264 105.424
## mahmoudvand_hassani  57.774 43.476  82.857
## equal_tailed         57.774 43.936  84.382
## shortest_length      57.774 42.014  81.012
## normal_approximation 57.774 44.533  85.272
## norm                 57.774 39.136  78.276
## basic                57.774 38.897  77.648
##                                                        description
## kelley                                       cv with Kelley 95% CI
## mckay                                         cv with McKay 95% CI
## miller                                       cv with Miller 95% CI
## vangel                                       cv with Vangel 95% CI
## mahmoudvand_hassani             cv with Mahmoudvand-Hassani 95% CI
## equal_tailed                           cv with Equal-Tailed 95% CI
## shortest_length                     cv with Shortest-Length 95% CI
## normal_approximation           cv with Normal Approximation 95% CI
## norm                 cv with Normal Approximation Bootstrap 95% CI
## basic                               cv with Basic Bootstrap 95% CI

References

Albatineh, A. N., Kibria, B. M., Wilcox, M. L., & Zogheib, B. (2014). Confidence interval estimation for the population coefficient of variation using ranked set sampling: A simulation study. Journal of Applied Statistics, 41(4), 733–751. https://doi.org/10.1080/02664763.2013.847405

Canty, A., & Ripley, B. (2017). boot: Bootstrap R (S-Plus) Functions. R package version 1.3-20.

Edward Miller, G. (1991). Asymptotic test statistics for coefficients of variation. Communications in Statistics-Theory and Methods, 20(10), 3351–3363.

Kelley, K. (2007). Sample size planning for the coefficient of variation from the accuracy in parameter estimation approach. Behavior Research Methods, 39(4), 755–766. https://doi.org/10.3758/BF03192966

Kelley, K. (2018). MBESS: The MBESS R Package. R package version 4.4. 3. Retrieved from https://cran.r-project.org/package=MBESS

Mahmoudvand, R., & Hassani, H. (2009). Two new confidence intervals for the coefficient of variation in a normal distribution. Journal of Applied Statistics, 36(4), 429–442.

McKay, A. T. (1932). Distribution of the Coefficient of Variation and the Extended" t" Distribution. Journal of the Royal Statistical Society, 95(4), 695–698.

Panichkitkosolkul, W. (2013). Confidence Intervals for the Coefficient of Variation in a Normal Distribution with a Known Population Mean. Journal of Probability and Statistics, 2013, 1–11. https://doi.org/10.1155/2013/324940

Vangel, M. G. (1996). Confidence intervals for a normal coefficient of variation. The American Statistician, 50(1), 21–26.