This document covers the topic of finding combinations or permutations that meet a specific set of criteria. For example, retrieving all combinations of a vector that have a product between two bounds.
There are 5 compiled constraint functions that can be utilized efficiently to test a given result.
They are passed as strings to the constraintFun
parameter. When these are employed without any other parameters being set, an additional column is added that represents the result of applying the given function to that combination/permutation. You can also set keepResults = TRUE
(more on this later).
library(RcppAlgos)
## base R using combn and FUN
combnSum <- combn(20, 10, sum)
algosSum <- comboGeneral(20, 10, constraintFun = "sum")
## Notice the additional column (i.e. the 11th column)
head(algosSum)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 1 2 3 4 5 6 7 8 9 10 55
[2,] 1 2 3 4 5 6 7 8 9 11 56
[3,] 1 2 3 4 5 6 7 8 9 12 57
[4,] 1 2 3 4 5 6 7 8 9 13 58
[5,] 1 2 3 4 5 6 7 8 9 14 59
[6,] 1 2 3 4 5 6 7 8 9 15 60
identical(as.integer(combnSum), algosSum[,11])
[1] TRUE
## Using parallel
paralSum <- comboGeneral(20, 10, constraintFun = "sum", Parallel = TRUE)
identical(paralSum, algosSum)
[1] TRUE
library(microbenchmark)
microbenchmark(serial = comboGeneral(20, 10, constraintFun = "sum"),
parallel = comboGeneral(20, 10, constraintFun = "sum", Parallel = TRUE),
combnSum = combn(20, 10, sum))
Unit: milliseconds
expr min lq mean median uq max neval
serial 3.244201 3.829026 4.263844 3.977368 4.083782 8.391110 100
parallel 1.108852 1.261750 1.501602 1.316356 1.362703 3.678798 100
combnSum 227.889743 233.444273 239.778906 236.130796 239.313303 297.253419 100
rowSums
and rowMeans
Finding row sums or row means is even faster than simply applying the highly efficient rowSums
/rowMeans
after the combinations have already been generated:
## Pre-generate combinations
combs <- comboGeneral(25, 10)
## Testing rowSums alone against generating combinations as well as summing
microbenchmark(serial = comboGeneral(25, 10, constraintFun = "sum"),
parallel = comboGeneral(25, 10, constraintFun = "sum", Parallel = TRUE),
rowsums = rowSums(combs))
Unit: milliseconds
expr min lq mean median uq max neval
serial 104.54241 116.17949 120.57489 119.61571 123.86698 174.2058 100
parallel 37.70034 46.51309 48.82646 48.39089 51.16243 112.6135 100
rowsums 98.50969 100.68145 105.76113 102.10867 107.19118 135.9790 100
all.equal(rowSums(combs),
comboGeneral(25, 10,
constraintFun = "sum",
Parallel = TRUE)[,11])
[1] TRUE
## Testing rowMeans alone against generating combinations as well as obtain row means
microbenchmark(serial = comboGeneral(25, 10, constraintFun = "mean"),
parallel = comboGeneral(25, 10, constraintFun = "mean", Parallel = TRUE),
rowmeans = rowMeans(combs))
Unit: milliseconds
expr min lq mean median uq max neval
serial 171.22701 182.2575 198.14229 198.29388 213.14787 240.1016 100
parallel 54.41817 59.2010 73.12819 74.61183 77.93679 119.8481 100
rowmeans 102.34084 103.2962 115.09863 107.12878 120.19177 174.8596 100
all.equal(rowMeans(combs),
comboGeneral(25, 10,
constraintFun = "mean",
Parallel = TRUE)[,11])
[1] TRUE
In both cases above, RcppAlgos
is doing double the work nearly twice as fast!!!
limitConstraints
The standard 5 comparison operators (i.e. "<"
, ">"
, "<="
, ">="
, & "=="
) can be used in a variety of ways. In order for them to have any effect, they must be used in conjunction with constraintFun
as well as limitConstraints
. The latter is the value(s) that will be used for comparison. It can be passed as a single value or a vector of two numerical values. This is useful when one wants to find results that are between (or outside) of a given range.
First we will look at cases with only one comparison and one value for the limitConstraint
.
## Generate some random data. N.B. Using R >= 3.6.0
set.seed(101)
myNums <- sample(500, 20)
myNums
[1] 187 22 354 327 124 149 289 165 307 269 432 346 358 454 222 287 398 109 199 19
## Find all 5-tuples combinations without repetition of myNums
## (defined above) such that the sum is equal to 1176.
p1 <- comboGeneral(v = myNums, m = 5,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = 1176)
tail(p1)
[,1] [,2] [,3] [,4] [,5]
[28,] 124 149 287 289 327
[29,] 124 165 187 346 354
[30,] 124 165 222 307 358
[31,] 124 187 222 289 354
[32,] 124 187 269 289 307
[33,] 149 187 199 287 354
## Authenticate with brute force
allCombs <- comboGeneral(sort(myNums), 5)
identical(p1, allCombs[which(rowSums(allCombs) == 1176), ])
[1] TRUE
## How about finding combinations with repetition
## whose mean is less than or equal to 150.
p2 <- comboGeneral(v = myNums, m = 5, TRUE,
constraintFun = "mean",
comparisonFun = "<=",
limitConstraints = 150)
## Again, we authenticate with brute force
allCombs <- comboGeneral(sort(myNums), 5, TRUE)
identical(p2, allCombs[which(rowMeans(allCombs) <= 150), ])
[1] FALSE ### <<--- What? They should be the same
## N.B.
class(p2[1, ])
[1] "numeric"
class(allCombs[1, ])
[1] "integer"
## When mean is employed or it can be determined that integral
## values will not suffice for the comparison, we fall back to
## numeric types, thus all.equal should return TRUE
all.equal(p2, allCombs[which(rowMeans(allCombs) <= 150), ])
[1] TRUE
Sometimes, we need to generate combinations/permutations such that when we apply a constraint function, the results are between (or outside) a given range. There is a natural two step process when finding results outside a range, however for finding results between a range, this two step approach could become computationally demanding. The underlying algorithms in RcppAlgos
are optimized for both cases and avoids adding results that will eventually be removed.
Using two comparisons is easy. The first comparison operator is applied to the first limit and the second operator is applied to the second limit.
Note that in the examples below, we have keepResults = TRUE
. This means an additional column will be added to the output that is the result of applying constraintFun
to that particular combination.
## Get combinations such that the product is
## strictly between 3600 and 4000
comboGeneral(5, 7, TRUE, constraintFun = "prod",
comparisonFun = c(">","<"), ## Find results > 3600 and < 4000
limitConstraints = c(3600, 4000),
keepResults = TRUE)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 2 3 5 5 5 5 3750
[2,] 1 3 4 4 4 4 5 3840
[3,] 2 2 3 4 4 4 5 3840
[4,] 3 3 3 3 3 3 5 3645
[5,] 3 3 3 3 3 4 4 3888
# ## The above is the same as doing the following:
# comboGeneral(5, 7, TRUE, constraintFun = "prod",
# comparisonFun = c("<",">"), ## Note that the comparison vector
# limitConstraints = c(4000, 3600), ## and the limits have flipped
# keepResults = TRUE)
## What about finding combinations outside a range
outside <- comboGeneral(5, 7, TRUE, constraintFun = "prod",
comparisonFun = c("<=",">="),
limitConstraints = c(3600, 4000),
keepResults = TRUE)
all(apply(outside[, -8], 1, prod) <= 3600
| apply(outside[, -8], 1, prod) >= 4000)
[1] TRUE
dim(outside)
[1] 325 8
## Note that we obtained 5 results when searching "between"
## 3600 and 4000. Thus we have: 325 + 5 = 330
comboCount(5, 7, T)
[1] 330
tolerance
When the underlying type is numeric
, round-off errors can occur. As stated in floating-point error mitigation:
“By definition, floating-point error cannot be eliminated, and, at best, can only be managed.”
Here is a great stackoverflow post that further illuminates this tricky topic:
For these reasons, the argument tolerance
can be utilized to refine a given constraint. It is added to the upper limit and subtracted from the lower limit. The default value is sqrt(.Machine$double.eps) ~= 0.00000001490116
.
This default value is good and bad.
For the good side:
dim(comboGeneral(seq(0,0.5,0.05), 6, TRUE,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = 1))
[1] 199 6
## Confirm with integers and brute force
allCbs <- comboGeneral(seq(0L,50L,5L), 6, TRUE, constraintFun = "sum")
sum(allCbs[, 7] == 100L)
[1] 199
If we had a tolerance of zero, we would have obtained an incorrect result:
## We miss 29 combinations that add up to 1
dim(comboGeneral(seq(0,0.5,0.05), 6, TRUE,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = 1, tolerance = 0))
[1] 170 6
And now for a less desirable result. The example below appears to give incorrect results. That is, we shouldn’t return any combination with a mean of 4 or 5.
comboGeneral(c(2, 3, 5, 7), 3, T,
constraintFun = "mean", comparisonFun = c("<", ">"),
limitConstraints = c(5, 4), keepResults = TRUE)
[,1] [,2] [,3] [,4]
[1,] 2 3 7 4.000000
[2,] 2 5 5 4.000000
[3,] 2 5 7 4.666667
[4,] 3 3 7 4.333333
[5,] 3 5 5 4.333333
[6,] 3 5 7 5.000000
[7,] 5 5 5 5.000000
In the above example, the range that is actually tested against is c(3.99999998509884, 5.00000001490116)
. To obtain the intended result, we use a more restrictive tolerance.
permuteGeneral
Typically, when we call permuteGeneral
, the output is in lexicographical order, however when we apply a constraint, the underlying algorithm checks against combinations only, as this is more efficient. If a particular combination meets a constraint, then all permutations of that vector also meet that constraint, so there is no need to check them. For this reason, the output isn’t in order. Observe:
permuteGeneral(c(2, 3, 5, 7), 3, freqs = rep(2, 4),
constraintFun = "mean", comparisonFun = c(">", "<"),
limitConstraints = c(4, 5), keepResults = TRUE, tolerance = 0)
[,1] [,2] [,3] [,4]
[1,] 2 5 7 4.666667 <-- First combination that meets the criteria
[2,] 2 7 5 4.666667
[3,] 5 2 7 4.666667
[4,] 5 7 2 4.666667
[5,] 7 2 5 4.666667
[6,] 7 5 2 4.666667
[7,] 3 3 7 4.333333 <-- Second combination that meets the criteria
[8,] 3 7 3 4.333333
[9,] 7 3 3 4.333333
[10,] 3 5 5 4.333333 <-- Third combination that meets the criteria
[11,] 5 3 5 4.333333
[12,] 5 5 3 4.333333
As you can see, the 2nd through the 6th entries are simply permutations of the 1st entry. Similarly, entries 8 and 9 are permutations of the 7th and entries 11 and 12 are permutations of the 10th.
Rcpp::checkUserInterrupt
Some of these operations can take some time, especially when you are in the exploratory phase and you don’t have that much information about what type of solution you will obtain. For this reason, we have added the ability to interrupt execution as of version 2.4.0
. Under the hood, we call Rcpp::checkUserInterrupt()
once every second to check if the user has requested for the process to be interrupted. Note that we only check for user interruptions when we cannot determine the number of results up front.
In prior versions, the example below would have executed indefinitely and the only way to stop it would have been to force quit R
. This is not a very good solution.
With version 2.4.0
, if we initiate a process that will take a long time or exhaust all of the available memory (e.g. we forget to put an upper limit on the number of results, relax the tolerance, etc.), we can simply hit Ctrl + c
, or esc
if using RStudio
, to stop execution.
set.seed(123)
s <- rnorm(1000)
## Oops!! We forgot to limit the output/put a loose tolerance
## There are as.numeric(comboCount(s, 20, T)) ~= 4.964324e+41
## This will either take a long long time, or all of your
## memory will be consumed!!!
##
## No problem... simply hit Ctrl + c or if in RStudio, hit esc
## or hit the "Stop" button
system.time(testInterrupt <- comboGeneral(s, 20, TRUE,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = 0))
Timing stopped at: 2.982 0.011 3.002
## Whew, that was close! Let's try again.
system.time(takeTwo <- comboGeneral(s, 20, TRUE,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = 0,
upper = 10, tolerance = 0.00001))
user system elapsed
0.056 0.000 0.058
dim(takeTwo)
[1] 10 20
## Tighten the tolerance
system.time(takeThree <- comboGeneral(s, 20, TRUE,
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = 0,
upper = 10, tolerance = 0.0000001))
user system elapsed
9.194 0.011 9.212
dim(takeThree)
[1] 10 20
## We are getting closer... tighten the tolerance and repeat
Specialized algorithms are employed when it can be determined that we are looking for integer partitions.
We need v = 0:N
, repetition = TRUE
, constraintFun = "sum"
, comparisonFun = "=="
, and limitConstraints = N
. When we leave m = NULL
, m
is internally set to the length of the longest non-zero combination (this is true for all cases below).
Everything is the same as above except for explicitly setting the desired length and deciding whether to include zero or not.
## Including zero
comboGeneral(0:5, 3, repetition = TRUE,
constraintFun = "sum",
comparisonFun = "==", limitConstraints = 5)
[,1] [,2] [,3]
[1,] 0 0 5
[2,] 0 1 4
[3,] 0 2 3
[4,] 1 1 3
[5,] 1 2 2
## Zero not included
comboGeneral(5, 3, repetition = TRUE,
constraintFun = "sum",
comparisonFun = "==", limitConstraints = 5)
[,1] [,2] [,3]
[1,] 1 1 3
[2,] 1 2 2
Same as Case 1 & 2
except now we have repetition = FALSE
.
comboGeneral(0:10, constraintFun = "sum",
comparisonFun = "==", limitConstraints = 10)
[,1] [,2] [,3] [,4]
[1,] 0 1 2 7
[2,] 0 1 3 6
[3,] 0 1 4 5
[4,] 0 2 3 5
[5,] 1 2 3 4
## Zero not included and restrict the length
comboGeneral(10, 3, constraintFun = "sum",
comparisonFun = "==", limitConstraints = 10)
[,1] [,2] [,3]
[1,] 1 2 7
[2,] 1 3 6
[3,] 1 4 5
[4,] 2 3 5
## Include zero and restrict the length
comboGeneral(0:10, 3, constraintFun = "sum",
comparisonFun = "==", limitConstraints = 10)
[,1] [,2] [,3]
[1,] 0 1 9
[2,] 0 2 8
[3,] 0 3 7
[4,] 0 4 6
[5,] 1 2 7
[6,] 1 3 6
[7,] 1 4 5
[8,] 2 3 5
## partitions of 10 into distinct parts of every length
lapply(1:4, function(x) {
comboGeneral(10, x, constraintFun = "sum",
comparisonFun = "==", limitConstraints = 10)
})
[[1]]
[,1]
[1,] 10
[[2]]
[,1] [,2]
[1,] 1 9
[2,] 2 8
[3,] 3 7
[4,] 4 6
[[3]]
[,1] [,2] [,3]
[1,] 1 2 7
[2,] 1 3 6
[3,] 1 4 5
[4,] 2 3 5
[[4]]
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
freqs
to Refine LengthWe can utilize the freqs
argument to obtain more distinct partitions by allowing for repeated zeros. The super optimized algorithm will only be carried out if zero is included and the number of repetitions for every number except zero is one.
For example, given v = 0:N
and J >= 1
, if freqs = c(J, rep(1, N))
, then the super optimized algorithm will be used, however if freqs = c(J, 2, rep(1, N - 1))
, the general algorithm will be used. It should be noted that the general algorithm is still highly optimized so one should not fear using it.
A pattern that is guaranteed to retrieve all distinct partitions of N is to set v = 0:N
and freqs = c(N, rep(1, N))
(the extra zeros will be left off).
## Obtain all distinct partitions of 10
comboGeneral(0:10, freqs = c(10, rep(1, 10)), ## Same as c(3, rep(1, 10))
constraintFun = "sum",
comparisonFun = "==", limitConstraints = 10)
[,1] [,2] [,3] [,4]
[1,] 0 0 0 10
[2,] 0 0 1 9
[3,] 0 0 2 8
[4,] 0 0 3 7
[5,] 0 0 4 6
[6,] 0 1 2 7
[7,] 0 1 3 6
[8,] 0 1 4 5
[9,] 0 2 3 5
[10,] 1 2 3 4
freqs
As noted in Case 1
, if m = NULL
, the length of the output will be determined by the longest non-zero combination that sums to N.
## m is NOT NULL and output has at most 2 zeros
comboGeneral(0:10, 3, freqs = c(2, rep(1, 10)),
constraintFun = "sum",
comparisonFun = "==", limitConstraints = 10)
[,1] [,2] [,3]
[1,] 0 0 10
[2,] 0 1 9
[3,] 0 2 8
[4,] 0 3 7
[5,] 0 4 6
[6,] 1 2 7
[7,] 1 3 6
[8,] 1 4 5
[9,] 2 3 5
## m is NULL and output has at most 2 zeros
comboGeneral(0:10, freqs = c(2, rep(1, 10)),
constraintFun = "sum",
comparisonFun = "==", limitConstraints = 10)
[,1] [,2] [,3] [,4]
[1,] 0 0 1 9
[2,] 0 0 2 8
[3,] 0 0 3 7
[4,] 0 0 4 6
[5,] 0 1 2 7
[6,] 0 1 3 6
[7,] 0 1 4 5
[8,] 0 2 3 5
[9,] 1 2 3 4
Compositions are related to integer partitions, however order matters. With RcppAlgos
, we generate compositions in a similar manner to generating partitions, but instead of using comboGeneral
, we use permuteGeneral
.
As noted above, when we call permuteGeneral
with constraints, the output is not in lexicographical order. This holds for generating compositions as well, so if order is important to you, there are a couple of great packages that take care to preserve order when generating compositions (the partitions
packages and the arrangements
package).
With that in mind, generating compositions with RcppAlgos
is easy, flexible, and quite efficient. One thing to keep in mind is that when we explicitly give m
and zero is included, we will obtain weak compositions, which allow for zeros to be a part of the sequence (E.g. c(0, 0, 5), c(0, 5, 0), c(5, 0, 0)
are weak compositions of 5).
## See Case 1
permuteGeneral(0:5, repetition = TRUE,
constraintFun = "sum",
comparisonFun = "==", limitConstraints = 5)
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 5
[2,] 0 0 0 1 4
[3,] 0 0 0 4 1
[4,] 0 0 0 2 3
[5,] 0 0 0 3 2
[6,] 0 0 1 1 3
[7,] 0 0 1 3 1
[8,] 0 0 3 1 1
[9,] 0 0 1 2 2
[10,] 0 0 2 1 2
[11,] 0 0 2 2 1
[12,] 0 1 1 1 2
[13,] 0 1 1 2 1
[14,] 0 1 2 1 1
[15,] 0 2 1 1 1
[16,] 1 1 1 1 1
## When m is NOT NULL, we get weak compositions as well. Using ht
## function defined in "Combination and Permutation Basics" vignette
ht(permuteGeneral(0:5, 5, repetition = TRUE,
constraintFun = "sum",
comparisonFun = "==", limitConstraints = 5))
head -->
[,1] [,2] [,3] [,4] [,5]
[1,] 0 0 0 0 5
[2,] 0 0 0 5 0
[3,] 0 0 5 0 0
[4,] 0 5 0 0 0
[5,] 5 0 0 0 0
--------
tail -->
[,1] [,2] [,3] [,4] [,5]
[122,] 2 0 1 1 1
[123,] 2 1 0 1 1
[124,] 2 1 1 0 1
[125,] 2 1 1 1 0
[126,] 1 1 1 1 1
## See Case 3
permuteGeneral(6, 3,
constraintFun = "sum",
comparisonFun = "==", limitConstraints = 6)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 3 2
[3,] 2 1 3
[4,] 2 3 1
[5,] 3 1 2
[6,] 3 2 1
## Using freqs along with including zero to obtain
## all compositions into distinct parts
permuteGeneral(0:6, freqs = c(6, rep(1, 6)),
constraintFun = "sum",
comparisonFun = "==", limitConstraints = 6)
[,1] [,2] [,3]
[1,] 0 0 6
[2,] 0 1 5
[3,] 0 5 1
[4,] 0 2 4
[5,] 0 4 2
[6,] 1 2 3
[7,] 1 3 2
[8,] 2 1 3
[9,] 2 3 1
[10,] 3 1 2
[11,] 3 2 1
## partitions of 60
system.time(parts60 <- comboGeneral(0:60, repetition = TRUE,
constraintFun = "sum",
comparisonFun = "==", limitConstraints = 60))
user system elapsed
0.079 0.046 0.126
dim(parts60)
[1] 966467 60
## partitions of 120 into distinct parts
system.time(partsDist120 <- comboGeneral(0:120, freqs = c(120, rep(1, 120)),
constraintFun = "sum",
comparisonFun = "==",
limitConstraints = 120))
user system elapsed
0.126 0.000 0.126
dim(partsDist120)
[1] 2194432 15
## compositions of 25
system.time(comp25 <- permuteGeneral(0:25, repetition = TRUE,
constraintFun = "sum",
comparisonFun = "==", limitConstraints = 25))
user system elapsed
1.266 0.332 1.601
dim(comp25)
[1] 16777216 25
## weak compositions of 12
system.time(weakComp12 <- permuteGeneral(0:12, 12, repetition = TRUE,
constraintFun = "sum",
comparisonFun = "==", limitConstraints = 12))
user system elapsed
0.029 0.000 0.030
dim(weakComp12)
[1] 1352078 12
## compositions of 50 into distinct parts
system.time(compDist50 <- permuteGeneral(0:50, freqs = c(50, rep(1, 50)),
constraintFun = "sum",
comparisonFun = "==", limitConstraints = 50))
user system elapsed
0.231 0.074 0.305
dim(compDist50)
[1] 10759321 9
## Using ht function defined in "Combination and Permutation Basics" vignette
ht(compDist50)
head -->
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 0 0 0 0 0 0 0 0 50
[2,] 0 0 0 0 0 0 0 1 49
[3,] 0 0 0 0 0 0 0 49 1
[4,] 0 0 0 0 0 0 0 2 48
[5,] 0 0 0 0 0 0 0 48 2
--------
tail -->
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[10759317,] 10 9 8 7 6 4 1 3 2
[10759318,] 10 9 8 7 6 4 2 1 3
[10759319,] 10 9 8 7 6 4 2 3 1
[10759320,] 10 9 8 7 6 4 3 1 2
[10759321,] 10 9 8 7 6 4 3 2 1