This vignette file conveys certain ideas behind the generalized Poisson distribution and some examples of applying the functions in this package (RNGforGPD).
GenUniGpois
We choose different data generation methods according to different parameter values because restrictions apply when the rate parameter or the dispersion parameter of the generalized Poisson is within certain ranges. For example, the normal approximation method does not work well for theta < 10.
GenUniGpois(2, 0.9, 100, method = "Branching")
#> [1] "Specified theta is 2, empirical theta is 1.954999, specified lambda is 0.9, empirical lambda is 0.907913."
GenUniGpois(5, -0.4, 100, method = "Inversion")
#> [1] "Specified theta is 5, empirical theta is 4.978908, specified lambda is -0.4, empirical lambda is -0.390756."
GenUniGpois(12, 0.5, 100, method = "Normal-Approximation")
#> [1] "Specified theta is 12, empirical theta is 10.287302, specified lambda is 0.5, empirical lambda is 0.562056."
data = GenUniGpois(10, 0.4, 10, method = "Chop-Down", details = FALSE)
data = GenUniGpois(3, 0.9, 10000, method = "Build-Up", details = FALSE)
ComputeCorrGpois
From a practical perspective, correlation bounds among variables are typically narrower than between −1 and 1 (the theoretical maximum and minimum correlation bounds) because different correlation upper and lower bounds may be imposed by the marginal distributions. A simple sorting technique can be used to obtain approximate correlation bounds and this approach works regardless of the data type or the distributional assumption (Demirtas, Hedeker 2011).
Using the sorting technique, we wrote the function ValidCorrGpois
that computes the lower and upper correlation bounds between a pair of generalized Poisson variables. Besides, this function serves as an integral part of the ValidCorrGpois
function, which examines whether values of pairwise correlation matrices fall within the limits imposed by the marginal distributions.
ComputeCorrGpois(c(3,2,5,4),c(0.3,0.2,0.5,0.6))
#> ............
#> $min
#> [,1] [,2] [,3] [,4]
#> [1,] NA -0.8434126 -0.8519714 -0.8042731
#> [2,] -0.8434126 NA -0.8372274 -0.7873092
#> [3,] -0.8519714 -0.8372274 NA -0.7977650
#> [4,] -0.8042731 -0.7873092 -0.7977650 NA
#>
#> $max
#> [,1] [,2] [,3] [,4]
#> [1,] NA 0.9836401 0.9936699 0.9871801
#> [2,] 0.9836401 NA 0.9869372 0.9816854
#> [3,] 0.9936699 0.9869372 NA 0.9941634
#> [4,] 0.9871801 0.9816854 0.9941634 NA
ComputeCorrGpois(c(4,5),c(-0.45,-0.11))
#> ..
#> $min
#> [,1] [,2]
#> [1,] NA -0.948794
#> [2,] -0.948794 NA
#>
#> $max
#> [,1] [,2]
#> [1,] NA 0.9544138
#> [2,] 0.9544138 NA
ValidCorrGpois
This function checks the required conditions of the values of pairwise correlations, which include positive definiteness, symmetry, correctness of dimensions, and whether the correlations fall within the correlation bounds. ValidCorrGpois
ensures that the supplied correlation matrix is valid for simulating multivariate generalized Poisson distributions using GenMVGpois
.
# ValidCorrGpois(matrix(c(1, 0.9, 0.9, 1), byrow = TRUE, nrow = 2), c(0.5, 0.5), c(0.1, 0.105))
# ValidCorrGpois(matrix(c(1, 0.9, 0.9, 1), byrow = TRUE, nrow = 2), c(3, 2), c(-0.3, -0.2))
QuantileGpois
This function computes quantiles for generalized Poisson distribution. We guarantee that there will be at least five classes if lambda is negative by forcing \(m \geq 4\).
QuantileGpois(0.98,1,-0.2,details = TRUE)
#> x = 0, P(X = x) = 0.3678794 ,P(X <= x) = 0.3678794
#> x = 0, P(X = x) = 0.3678794 , P(X <= x) = 0.3678794
#> x= 1 , P(X = x) = 0.449329 , P(X <= x) = 0.8172084
#> x= 2 , P(X = x) = 0.1646435 , P(X <= x) = 0.9818519
#> When lambda is negative, we need to account for truncation error
#> The adjusted CDF are: 0.3746792 0.8323133 1
#> [1] 2
QuantileGpois(0.80,2,0.025,details = FALSE)
#> [1] 3
CorrNNGpois
This function applies the method proposed by Yahav, Shmueli 2011. Yahav and Shmueli found that the relationship between the desired correlation matrix and the actual correlation matrix of a generalized Poisson distribution can be approximated by an exponential function. Following their simple and empirically based approximation method we can correct our actual correlation matrix to the desired correlation matrix. Note that some desired correlations might be infeasible.
CorrNNGpois = function(theta.vec, lambda.vec, r) {
if(abs(r) > 1) stop("The desired correlation r has to be within (-1,1)")
samples = 1e+05
u = GenUniGpois(theta.vec[1], lambda.vec[1], samples, method = "Inversion", details = FALSE)$data
v = GenUniGpois(theta.vec[2], lambda.vec[2], samples, method = "Inversion", details = FALSE)$data
maxcor = cor(sort(u), sort(v))
mincor = cor(sort(u), sort(v, decreasing=TRUE))
a = -maxcor * mincor/(maxcor + mincor)
b = log((maxcor + a)/a)
c = -a
corrected = log( (r + a)/a )/b
if( abs(corrected) > 1 ) cat("The actual correlation, ",corrected,", is not feasible!", sep = "")
else{ return(corrected) }
}
CorrNNGpois(c(0.1,10), c(0.1, 0.2),0.5)
#> Warning in vglm.fitter(x = x, y = y, w = w, offset = offset, Xm2 =
#> Xm2, : some quantities such as z, residuals, SEs may be inaccurate due to
#> convergence at a half-step
#> [1] 0.8005848
CorrNNGpois(c(0.1,10), c(-0.01, -0.02),0.5)
#> [1] 0.8266772
CorrNNGpois(c(4,2.3), c(-0.32,-0.3),0.7)
#> [1] 0.7515858
CorrNNGpois(c(14,10), c(-0.8, -0.3),0.99)
#> The actual correlation, 1.01913, is not feasible!
CmatStarGpois
This function computes the intermediate correlation values for Poisson-Poisson and Poisson-Normal pairs, and constructs an overall intermediate correlation matrix. It takes the target correlation matrix and returns the intermediate matrix of pairwise correlations.
The output of the cmat.star
function is important because it is one of the input arguments for the main data generating function: GenMVGpois
. The intermediate correlation matrix will lead to the target correlation matrix using inverse CDF transformation of the samples generated from a multivariate normal distribution.
lambda.vec = c(-0.2, 0.2, -0.3)
theta.vec = c(1, 3, 4)
M = c(0.352, 0.265, 0.342)
N = diag(3)
N[lower.tri(N)] = M
TV = N + t(N)
diag(TV) = 1
cstar = CmatStarGpois(TV, theta.vec, lambda.vec)
#> ......
#> ......
#> ......
#> .........
cstar
#> [,1] [,2] [,3]
#> [1,] 1.0000000 0.3945659 0.2946112
#> [2,] 0.3945659 1.0000000 0.3597725
#> [3,] 0.2946112 0.3597725 1.0000000
GenMVGpois
GenMVGPois
(the engine function) is the most important function in this package (RNGforGPD). It depends on all the other functions in this package and three external packages: mvtnorm, corpcor, and VGAM. The major difference between the univariate generalized Poisson variables generating function and the multivariate one is the consideration of pairwise correlations between variables. These correlations can be verified using ValidCorrGpois
and corrected by CorrNNGpois
.
lambda.vec = c(-0.2, 0.2, -0.3)
theta.vec = c(1, 3, 4)
M = c(0.352, 0.265, 0.342)
N = diag(3)
N[lower.tri(N)] = M
TV = N + t(N)
diag(TV) = 1
cstar = CmatStarGpois(TV, theta.vec, lambda.vec)
#> ......
#> ......
#> ......
#> .........
sample.size = 10000; no.gpois = 3
data = GenMVGpois(sample.size, no.gpois, cstar, theta.vec, lambda.vec, details = FALSE)
apply(data, 2, mean) # empirical means
#> [1] 0.8274 3.7080 3.0682
theta.vec / (1 - lambda.vec) # theoretical means
#> [1] 0.8333333 3.7500000 3.0769231
apply(data, 2, var) # empirical variances
#> [1] 0.5666659 5.6993059 1.8251313
theta.vec / (1 - lambda.vec)^3 # theoretical variances
#> [1] 0.5787037 5.8593750 1.8206645
cor(data) # empirical correlation matrix
#> [,1] [,2] [,3]
#> [1,] 1.0000000 0.3394127 0.2619511
#> [2,] 0.3394127 1.0000000 0.3333804
#> [3,] 0.2619511 0.3333804 1.0000000
TV # specified correlation matrix
#> [,1] [,2] [,3]
#> [1,] 1.000 0.352 0.265
#> [2,] 0.352 1.000 0.342
#> [3,] 0.265 0.342 1.000
Amatya, A. and Demirtas, H. (2015). Simultaneous generation of multivariate mixed data with Poisson and normal marginals. , , 3129-3139.
Amatya, A. and Demirtas, H. (2017). PoisNor: An R package for generation of multivariate data with Poisson and normal marginals. , , 2241-2253.
Demirtas, H. (2017). On accurate and precise generation of generalized Poisson variates. , , 489-499.
Demirtas, H. and Hedeker, D. (2011). A practical way for computing approximate lower and upper correlation bounds. , , 104-109.
Yahav, I. and Shmueli, G. (2012). On generating multivariate Poisson data in management science applications. , , 91-102.