Coefficient of variation (\(CV\)) is a measure of relative dispersion representing the degree of variability relative to the mean (Albatineh, Kibria, Wilcox, & Zogheib, 2014). Since cv is unitless, it is useful for comparison of variables with different units (Albatineh et al., 2014). It is also a measure of homogeneity. The population coefficient of variation is:
\[CV = \frac{\sigma}{\mu},\] where \(\sigma\) is the population standard deviation and \(\mu\) is the population mean. Almost always, we analyze data from samples but want to generalize it as the population’s parameter (Albatineh et al., 2014). Its sample’s estimate is given as:
\[cv = \frac{sd}{\bar{X}}\]
where \(sd\) is the sample standard deviation, the square root of the unbiased estimator of population variance, and \(\bar{X}\) is the sample mean. The corrected cv to account for the sample size is: \[
cv_{corr} = cv * \biggl(1 - \frac{1}{4(n-1)}
+ \frac{1}{n}cv^2
+ \frac{1}{2 (n-1)^2} \biggr)
\] There are various methods for the calculation of confidence intervals (CI) for cv. All of them are fruitful and have particular use cases. Some of them are model-based hence their usage depends the assumptions regarding the distribution of data. For sake of versatility, we cover almost all of these methods in cvcqv
package. Here, we explain them along with some examples:
Let us assume that CV follows a noncentral t distribution, when the parent population of the scores is normally-distributed, with noncentrality (\(\lambda\)) parameter:
\[
\lambda = \frac{\sqrt{n}}{cv}
\] with v degrees of freedom, where \(v = n - 1\). Let \(1 - \alpha\) be the CI coverage with \(\alpha_L + \alpha_U = \alpha\) in which \(\alpha_L\) is the the proportion of times that cv will be less than the lower confidence bound and \(\alpha_U\) the proportion of times that cv will be greater than the upper confidence bound in the CI procedure (Kelley, 2007). The lower confidence tile for \(\lambda\) is is the noncentrality parameter that results in \(t_{(1-\alpha_L,v,\lambda_L)}=\hat{\lambda}\) and the upper confidence tile for \(\lambda\) is is the noncentrality parameter that results in \(t_{(\alpha_U,v,\lambda_U)}=\hat{\lambda}\), where \(t_{(1-\alpha_L,v,\lambda_L)}=\hat{\lambda}\) is the value of noncentral t distribution at the \(1-\alpha_L\) quantile with noncentrality parameter \(\lambda_L\) and \(t_{(\alpha_U,v,\lambda_U)}=\hat{\lambda}\) is the value of noncentral t distribution at the \(\alpha_U\) quantile with noncentrality parameter \(\lambda_U\), respectively (Kelley, 2007).
Afterwards, we transform the tiles of the confidence interval for \(\lambda\), by dividing the tiles by \(\sqrt{n}\) and thereafter inverting them; the CI limits of \(cv\) will be obtained:
\[
p\left[\biggl(\frac{\lambda_U}{\sqrt{n}}\biggr)^{-1}
\le CV \le \biggl(\frac{\lambda_L}{\sqrt{n}}\biggr)^{-1}\right] = 1-\alpha
\] where \(p\) stands for probability. Thanks to package MBESS
(Kelley, 2018) for the computation of confidence limits for the noncentrality parameter from a t distribution (conf.limits.nct
), \(cv\) will be obtained as:
x <- c(
0.2, 0.5, 1.1, 1.4, 1.8, 2.3, 2.5, 2.7, 3.5, 4.4,
4.6, 5.4, 5.4, 5.7, 5.8, 5.9, 6.0, 6.6, 7.1, 7.9
)
cv_versatile(
x,
na.rm = TRUE,
digits = 3,
method = "kelley",
correction = TRUE,
alpha = 0.05
)
## $method
## [1] "Corrected cv with Kelley 95% CI"
##
## $statistics
## est lower upper
## 58.058 41.467 98.51
McKay (McKay, 1932) introduced the following CI for \(cv\); considering \(u_1 = \chi_{v,1-\alpha/2}^2\) and \(u_1 = \chi_{v,\alpha/2}^2\) being the \(100(1-\alpha/2)\%\) and \(100(\alpha/2)\%\) percentile of the \(\chi^2\) distribution with \(v = n-1\) degrees of freedom, respectively (Albatineh et al., 2014):
\[
\biggl(cv\left[\biggl(\frac{u_1}{v}-1\biggr)(cv)^{2}+\frac{u_1}{v}\right]^{-1/2}
\le CV \le cv
\left[\biggl(\frac{u_2}{v}-1\biggr)(cv)^{2}+\frac{u_2}{v}\right]^{-1/2}\biggr)
\] Let us calculate the 95% CI for our variable \(x\) according to McKay’s method (McKay, 1932):
## $method
## [1] "Corrected cv with McKay 95% CI"
##
## $statistics
## est lower upper
## 58.058 41.622 109.367
Miller (Edward Miller, 1991) introduced the following CI for \(cv\); considering \(Z_{\alpha/2}\) being the \((1-\alpha/2)\) percentile of the standard normal distribution (Albatineh et al., 2014): \[
\biggl(cv - Z_{\alpha/2}\sqrt{
\biggl(\frac{cv^2}{v}\biggr)\biggl(\frac{1}{2}+cv^2\biggr)} \le
CV \le cv + Z_{\alpha/2}\sqrt{
\biggl(\frac{cv^2}{v}\biggr)\biggl(\frac{1}{2}+cv^2\biggr)}
\biggr)
\] where \(v = n-1\) is the degree of freedom.
Let us calculate the 95% CI for \(x\) according to Miller’s method (Edward Miller, 1991):
## $method
## [1] "Corrected cv with Miller 95% CI"
##
## $statistics
## est lower upper
## 58.058 34.173 81.942
Vangel (Vangel, 1996) proposed the following CI for \(cv\); which is a modification on McKay’s CI: \[ \biggl(cv\left[\biggl(\frac{u_1+1}{v}-1\biggr)(cv)^{2}+\frac{u_1}{v}\right]^{-1/2} \le CV \le cv \left[\biggl(\frac{u_2+1}{v}-1\biggr)(cv)^{2}+\frac{u_2}{v}\right]^{-1/2}\biggr) \] Let us calculate the 95% CI for \(x\) according to Vangel’s method (Vangel, 1996):
## $method
## [1] "Corrected cv with Vangel 95% CI"
##
## $statistics
## est lower upper
## 58.058 41.443 106.237
Mahmoudvand and Hassani (Mahmoudvand & Hassani, 2009) proposed the following CI for \(cv\); which is obtained using ranked set sampling (RSS):
\[
\biggl(\frac{cv}{2-C_n+Z_{1-\alpha/2}\sqrt{1-C_n^2}}
\le CV \le
\frac{cv}{2-C_n-Z_{1-\alpha/2}\sqrt{1-C_n^2}}
\biggr)
\] where \[
C_n=\sqrt{\frac{2}{n-1}}\frac{\Gamma{(n/2)}}{\Gamma{((n-1)/2)}},
\Gamma(n)=(n-1)!
\] Let us now calculate the 95% CI for \(x\) according to Mahmoudvand-Hassani’s method (Mahmoudvand & Hassani, 2009):
cv_versatile(
x,
na.rm = TRUE,
digits = 3,
method = "mahmoudvand_hassani",
correction = TRUE,
alpha = 0.05
)
## $method
## [1] "Corrected cv with Mahmoudvand-Hassani 95% CI"
##
## $statistics
## est lower upper
## 58.058 43.69 83.264
Wararit Panichkitkosolkul (Panichkitkosolkul, 2013) proposed the following CI for \(cv\); which is a normal approximation: \[
\biggl(\frac{cv}{C_{n+1}+Z_{1-\alpha/2}\sqrt{1-C_{n+1}^2}}
\le CV \le
\frac{cv}{C_{n+1}-Z_{1-\alpha/2}\sqrt{1-C_{n+1}^2}}
\biggr)
\] where \(C_{n+1}=\sqrt{1-(1/2n)}\)
Now we calculate the normal approximation 95% CI for \(x\) according to Panichkitkosolkul (Panichkitkosolkul, 2013):
cv_versatile(
x,
na.rm = TRUE,
digits = 3,
method = "normal_approximation",
correction = TRUE,
alpha = 0.05
)
## $method
## [1] "Corrected cv with Normal Approximation 95% CI"
##
## $statistics
## est lower upper
## 58.058 44.752 85.691
Panichkitkosolkul (Panichkitkosolkul, 2013) has also introduced the following CI for \(cv\):
\[
\biggl(\frac{cv\sqrt{v}}{\sqrt{b}}
\le CV \le
\frac{cv\sqrt{v}}{\sqrt{a}}
\biggr)
\] with \(v = n-1\) degrees of freedom. Then, shortest-length 95% CI for \(x\) is:
cv_versatile(
x,
na.rm = TRUE,
digits = 3,
method = "shortest_length",
correction = TRUE,
alpha = 0.05
)
## $method
## [1] "Corrected cv with Shortest-Length 95% CI"
##
## $statistics
## est lower upper
## 58.058 42.221 81.411
The \(100(1-\alpha)\%\) equal-tailed CI for \(cv\) can be obtained as: \[
\biggl(\frac{cv\sqrt{v}}{\sqrt{\chi_{v,1-\alpha/2}^2}}
\le CV \le
\frac{cv\sqrt{v}}{\sqrt{\chi_{v,\alpha/2}^2}}
\biggr)
\] where \(\chi_{v,\alpha/2}^2\) and \(\chi_{v,1-\alpha/2}^2\) are the \(100(\alpha/2)\) and \(100(1-\alpha/2)\) percentiles of the central \(\chi^2\) distribution with \(v\) degrees of freedom, respectively (Panichkitkosolkul, 2013).
Then, equal-tailed 95% CI for \(x\) is:
cv_versatile(
x,
na.rm = TRUE,
digits = 3,
method = "equal_tailed",
correction = TRUE,
alpha = 0.05
)
## $method
## [1] "Corrected cv with Equal-Tailed 95% CI"
##
## $statistics
## est lower upper
## 58.058 44.152 84.797
Thanks to package boot
(Canty & Ripley, 2017) we can obtain bootstrap CI around \(cv\):
## $method
## [1] "Corrected cv with Basic Bootstrap 95% CI"
##
## $statistics
## est lower upper
## 58.058 37.866 78.014
In conclusion, we can observe CIs calculated by all available methods:
## $method
## [1] "All methods"
##
## $statistics
## est lower upper
## kelley 57.774 41.467 98.510
## mckay 57.774 41.441 108.482
## miller 57.774 34.053 81.494
## vangel 57.774 41.264 105.424
## mahmoudvand_hassani 57.774 43.476 82.857
## equal_tailed 57.774 43.936 84.382
## shortest_length 57.774 42.014 81.012
## normal_approximation 57.774 44.533 85.272
## norm 57.774 39.136 78.276
## basic 57.774 38.897 77.648
## description
## kelley cv with Kelley 95% CI
## mckay cv with McKay 95% CI
## miller cv with Miller 95% CI
## vangel cv with Vangel 95% CI
## mahmoudvand_hassani cv with Mahmoudvand-Hassani 95% CI
## equal_tailed cv with Equal-Tailed 95% CI
## shortest_length cv with Shortest-Length 95% CI
## normal_approximation cv with Normal Approximation 95% CI
## norm cv with Normal Approximation Bootstrap 95% CI
## basic cv with Basic Bootstrap 95% CI
Albatineh, A. N., Kibria, B. M., Wilcox, M. L., & Zogheib, B. (2014). Confidence interval estimation for the population coefficient of variation using ranked set sampling: A simulation study. Journal of Applied Statistics, 41(4), 733–751. https://doi.org/10.1080/02664763.2013.847405
Canty, A., & Ripley, B. (2017). boot: Bootstrap R (S-Plus) Functions. R package version 1.3-20.
Edward Miller, G. (1991). Asymptotic test statistics for coefficients of variation. Communications in Statistics-Theory and Methods, 20(10), 3351–3363.
Kelley, K. (2007). Sample size planning for the coefficient of variation from the accuracy in parameter estimation approach. Behavior Research Methods, 39(4), 755–766. https://doi.org/10.3758/BF03192966
Kelley, K. (2018). MBESS: The MBESS R Package. R package version 4.4. 3. Retrieved from https://cran.r-project.org/package=MBESS
Mahmoudvand, R., & Hassani, H. (2009). Two new confidence intervals for the coefficient of variation in a normal distribution. Journal of Applied Statistics, 36(4), 429–442.
McKay, A. T. (1932). Distribution of the Coefficient of Variation and the Extended" t" Distribution. Journal of the Royal Statistical Society, 95(4), 695–698.
Panichkitkosolkul, W. (2013). Confidence Intervals for the Coefficient of Variation in a Normal Distribution with a Known Population Mean. Journal of Probability and Statistics, 2013, 1–11. https://doi.org/10.1155/2013/324940
Vangel, M. G. (1996). Confidence intervals for a normal coefficient of variation. The American Statistician, 50(1), 21–26.