---
title: "Combinatorial Iterators in RcppAlgos"
author: "Joseph Wood"
date: "11/30/2023"
output:
rmarkdown::html_vignette:
toc: true
number_sections: false
vignette: >
%\VignetteIndexEntry{Combinatorial Iterators in RcppAlgos}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
This document covers working with combinatorial iterators in `RcppAlgos`. Combinatorial iterators in `RcppAlgos` are memory efficient like traditional iterator objects. They allow traversal of combinations/permutations/partitions/compositions/comboGroups one by one without the necessity for storing all results in memory.
Unlike traditional combinatorial iterators, the iterators in `RcppAlgos` offers random access via the `[[` operator. This means, we can access the *nth* [lexicographical order]() result on demand without having to first iterate over the previous *n - 1* results.
------------------------------------------------------------------------
## Iterating over Combinations and Permutations
In order to iterate, we must initialize an iterator via `comboIter` or `permuteIter`. The interface is very similar to `comboGeneral` and `permuteGeneral`.
``` r
library(RcppAlgos)
options(width = 90)
## Initialize the iterator
a = comboIter(5, 3)
## Get the first combination
a$nextIter()
#> [1] 1 2 3
## And the next
a$nextIter()
#> [1] 1 2 4
## Set the current iterator to a variable
iter = a$currIter()
i = 1
## Iterate until there are no more
while (!is.null(iter)) {
cat(i, " ", iter, "\n")
iter = a$nextIter()
i = i + 1
}
#> 1 1 2 4
#> 2 1 2 5
#> 3 1 3 4
#> 4 1 3 5
#> 5 1 4 5
#> 6 2 3 4
#> 7 2 3 5
#> 8 2 4 5
#> 9 3 4 5
#> No more results. To see the last result, use the prevIter method(s)
## See the output of comboGeneral for comparison
comboGeneral(5, 3, lower = 2)
#> [,1] [,2] [,3]
#> [1,] 1 2 4
#> [2,] 1 2 5
#> [3,] 1 3 4
#> [4,] 1 3 5
#> [5,] 1 4 5
#> [6,] 2 3 4
#> [7,] 2 3 5
#> [8,] 2 4 5
#> [9,] 3 4 5
## Call the summary method to see information about our iterator
a$summary()
#> $description
#> [1] "Combinations of 5 choose 3"
#>
#> $currentIndex
#> [1] 11
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] -1
```
### Bidirectional Iterators
Some of the combinatorial iterators in `RcppAlgos` are bidirectional iterators. This means that not only can we iterate in a forward manner (i.e. lexicographically), but we can also iterate backwards (i.e. [Reverse Lexicographical Order]()) via the `prevIter` method(s).
``` r
## Using the same iterable from the previous section
a$currIter()
#> No more results. To see the last result, use the prevIter method(s)
#> NULL
## As the comment says, we call the prevIter method to see the last result
a$prevIter()
#> [1] 3 4 5
## Get the previous result
a$prevIter()
#> [1] 2 4 5
## As in the previous example, we set the current iterator to a variable
iter = a$currIter()
## Defined above
print(i)
#> [1] 10
## Iterate until we are at the very beginning. Note that the
## output is exactly the same as above, but in reverse order
while (!is.null(iter)) {
i = i - 1
cat(i, " ", iter, "\n")
iter = a$prevIter()
}
#> 9 2 4 5
#> 8 2 3 5
#> 7 2 3 4
#> 6 1 4 5
#> 5 1 3 5
#> 4 1 3 4
#> 3 1 2 5
#> 2 1 2 4
#> 1 1 2 3
#> Iterator Initialized. To see the first result, use the nextIter method(s)
## Call the summary method to see information about our iterator
a$summary()
#> $description
#> [1] "Combinations of 5 choose 3"
#>
#> $currentIndex
#> [1] 0
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] 10
```
### Retrieving More than One Result at a Time
There are four methods which allow for obtaining more than one result at a time: `nextNIter`, `prevNIter`, `nextRemaining`, and `prevRemaining`.
``` r
## Reset the iterator
a$startOver()
## Get the next 4 combinations
a$nextNIter(4)
#> [,1] [,2] [,3]
#> [1,] 1 2 3
#> [2,] 1 2 4
#> [3,] 1 2 5
#> [4,] 1 3 4
## Get the summary. Note that the index has been updated
a$summary()
#> $description
#> [1] "Combinations of 5 choose 3"
#>
#> $currentIndex
#> [1] 4
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] 6
## View the current combination
a$currIter()
#> [1] 1 3 4
## Get the remaining combinations with nextRemaining
a$nextRemaining()
#> [,1] [,2] [,3]
#> [1,] 1 3 5
#> [2,] 1 4 5
#> [3,] 2 3 4
#> [4,] 2 3 5
#> [5,] 2 4 5
#> [6,] 3 4 5
a$summary()
#> $description
#> [1] "Combinations of 5 choose 3"
#>
#> $currentIndex
#> [1] 11
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] -1
```
Now, we look at the opposite direction.
``` r
## Get the previous 4 combinations
a$prevNIter(4)
#> [,1] [,2] [,3]
#> [1,] 3 4 5
#> [2,] 2 4 5
#> [3,] 2 3 5
#> [4,] 2 3 4
## Get the summary. Note that the index has been updated
a$summary()
#> $description
#> [1] "Combinations of 5 choose 3"
#>
#> $currentIndex
#> [1] 7
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] 3
## View the current combination
a$currIter()
#> [1] 2 3 4
## Get the remaining previous combinations with prevRemaining
a$prevRemaining()
#> [,1] [,2] [,3]
#> [1,] 1 4 5
#> [2,] 1 3 5
#> [3,] 1 3 4
#> [4,] 1 2 5
#> [5,] 1 2 4
#> [6,] 1 2 3
a$summary()
#> $description
#> [1] "Combinations of 5 choose 3"
#>
#> $currentIndex
#> [1] 0
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] 10
```
### Random Access Iterator
As with the bidirectional iterators, with some of the combinatorial iterators in `RcppAlgos`, we can jump to the *nth* result without the need for iterating over the first *n - 1* results.
``` r
## Reset the iterator
a$startOver()
## How many total combinations do we have?
a$summary()$totalResults
#> [1] 10
## Let's get the 3rd combination
a[[3]]
#> [1] 1 2 5
## See the summary. Note that the index has been updated
a$summary()
#> $description
#> [1] "Combinations of 5 choose 3"
#>
#> $currentIndex
#> [1] 3
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] 7
## Let's see the 9th combination
a[[9]]
#> [1] 2 4 5
## What about the first and last combination?
a$front()
#> [1] 1 2 3
a$back()
#> [1] 3 4 5
## Again the index has been updated
a$summary()
#> $description
#> [1] "Combinations of 5 choose 3"
#>
#> $currentIndex
#> [1] 10
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] 0
a$currIter()
#> [1] 3 4 5
```
We can also easily return a random sample of combinations with the `[[` operator by passing a vector of indices. In these cases, it should be noted that the current index will not be updated.
``` r
## Set the current index to the second combination
a[[2]]
#> [1] 1 2 4
set.seed(121)
samp = sample(a$summary()$totalResults, 4)
samp
#> [1] 4 7 10 1
a[[samp]]
#> [,1] [,2] [,3]
#> [1,] 1 3 4
#> [2,] 2 3 4
#> [3,] 3 4 5
#> [4,] 1 2 3
## Note that the current index remains unchanged
a$summary()
#> $description
#> [1] "Combinations of 5 choose 3"
#>
#> $currentIndex
#> [1] 2
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] 8
```
### User Defined Functions
Just as with `comboGeneral` and `permuteGeneral`, we can pass a user defined function to `comboIter` and `permuteIter`.
``` r
## Initialize the iterator
b = permuteIter(LETTERS[1:4], 3, FUN = function(p) paste(p, collapse = ""),
FUN.VALUE = "a")
b$nextIter()
#> [1] "ABC"
b$nextNIter(5)
#> [1] "ABD" "ACB" "ACD" "ADB" "ADC"
b$back()
#> [1] "DCB"
b$prevIter()
#> [1] "DCA"
b$prevNIter(5)
#> [1] "DBC" "DBA" "DAC" "DAB" "CDB"
b$nextRemaining()
#> [1] "DAB" "DAC" "DBA" "DBC" "DCA" "DCB"
## Random access
b[[5]]
#> [1] "ADB"
b$prevRemaining()
#> [1] "ACD" "ACB" "ABD" "ABC"
## View the source vector
b$sourceVector()
#> [1] "A" "B" "C" "D"
```
## New in Verison `2.5.0`
As of version `2.5.0`, we no longer rely on `Rcpp` as a dependency, which means that we do not utilize `Rcpp` modules for exposing C++ classes. This is now carried out using external pointers (See [External pointers and weak references]()) along with [S4 Classes](). We use the slots of `S4` classes for exposing each method so access is carried out with the “at sign”, `@`. We have also added the ability to access each method with the “dollar sign”, `$`, for backwards compatibility.
### Access Efficiency in `2.5.0+`
Our tests show that accessing methods is much more efficient in `2.5.0+` compared to prior versions. In the below tests, we measure excecution time of calling `nextIter` multiple times in different versions. We will use the function `test_nextIter` for our testing. If one needs to reproduce, simply download the `2.4.3` tar here: , change `RcppAlgos` to `RcppAlgos243` in a few place (e.g. `DESCRIPTION`, `NAMESPACE`, etc.), and rebuild.
``` r
test_nextIter <- function(n, m, get_val = FALSE, v = 243) {
a <- if (v == 243) {
RcppAlgos243::comboIter(n, m)
} else {
comboIter(n, m)
}
total <- comboCount(n, m)
if (get_val) {
mat <- matrix(0L, nrow = total, ncol = m)
for (i in 1:total) mat[i, ] <- a$nextIter()
return(mat)
} else {
if (v == 243) {
for (i in 1:total) a$nextIter()
} else {
for (i in 1:total) a@nextIter()
}
invisible(NULL)
}
}
```
#### Version `2.4.3` Using `Rcpp`
``` r
library(microbenchmark)
## Using R version 4.1.3
comboCount(15, 8)
#> [1] 6435
microbenchmark(v243 = test_nextIter(15, 8))
#> Warning in microbenchmark(v243 = test_nextIter(15, 8)): less accurate nanosecond times to
#> avoid potential integer overflows
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> v243 20.63817 21.75934 23.91622 21.94562 22.30697 200.7697 100
identical(test_nextIter(15, 8, get_val = TRUE),
comboGeneral(15, 8))
#> [1] TRUE
comboCount(25, 10)
#> [1] 3268760
system.time(test_nextIter(25, 10))
#> user system elapsed
#> 11.158 0.053 11.213
Rprof("Version243.out", memory.profiling = TRUE)
test_nextIter(25, 10)
Rprof(NULL)
lapply(summaryRprof("Version243.out", memory = "both"), head)
#> $by.self
#> self.time self.pct total.time total.pct mem.total
#> "as.environment" 3.96 38.37 3.96 38.37 4176.8
#> "$" 3.16 30.62 8.22 79.65 8664.3
#> "test_nextIter" 0.98 9.50 10.32 100.00 10594.2
#> ".External" 0.68 6.59 0.68 6.59 554.4
#> "get" 0.54 5.23 0.54 5.23 483.2
#> "exists" 0.50 4.84 0.50 4.84 530.8
#>
#> $by.total
#> total.time total.pct mem.total self.time self.pct
#> "test_nextIter" 10.32 100 10594.2 0.98 9.5
#> "" 10.32 100 10594.2 0.00 0.0
#> "block_exec" 10.32 100 10594.2 0.00 0.0
#> "call_block" 10.32 100 10594.2 0.00 0.0
#> "do.call" 10.32 100 10594.2 0.00 0.0
#> "doTryCatch" 10.32 100 10594.2 0.00 0.0
#>
#> $sample.interval
#> [1] 0.02
#>
#> $sampling.time
#> [1] 10.32
```
#### Version `2.8.3` (No `Rcpp`)
``` r
curr_version <- as.integer(gsub("\\.", "", packageVersion("RcppAlgos")))
curr_version
#> [1] 283
microbenchmark(curr_v = test_nextIter(15, 8, v = curr_version))
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> curr_v 2.626214 2.694602 2.806657 2.781789 2.873341 4.065765 100
system.time(test_nextIter(25, 10, v = curr_version))
#> user system elapsed
#> 1.354 0.012 1.366
identical(test_nextIter(15, 8, get_val = TRUE, v = curr_version),
comboGeneral(15, 8))
#> [1] TRUE
Rprof("Version250.out", memory.profiling = TRUE)
test_nextIter(25, 10, v = curr_version)
Rprof(NULL)
lapply(summaryRprof("Version250.out", memory = "both"), head)
#> $by.self
#> self.time self.pct total.time total.pct mem.total
#> ".Call" 0.56 45.90 0.56 45.9 430.3
#> "" 0.40 32.79 1.22 100.0 909.9
#> "test_nextIter" 0.26 21.31 1.22 100.0 909.9
#>
#> $by.total
#> total.time total.pct mem.total self.time self.pct
#> "" 1.22 100 909.9 0.40 32.79
#> "test_nextIter" 1.22 100 909.9 0.26 21.31
#> "block_exec" 1.22 100 909.9 0.00 0.00
#> "call_block" 1.22 100 909.9 0.00 0.00
#> "do.call" 1.22 100 909.9 0.00 0.00
#> "doTryCatch" 1.22 100 909.9 0.00 0.00
#>
#> $sample.interval
#> [1] 0.02
#>
#> $sampling.time
#> [1] 1.22
```
#### Conclusions
It appears that memory is the issue in previous versions. Indeed, if we look at [Memory statistics from Rprof](), and view both files with `memory = "stats"` we see that the C funciton, `duplicate`, appears to be the main culprit.
``` r
## We set index = 1 to ensure we get the very bottom of the stack
### Verison 2.4.3
v243 <- summaryRprof("Version243.out", memory = "stats", index = 1)
v243
#> index: "tryCatch"
#> vsize.small max.vsize.small vsize.large max.vsize.large nodes
#> 1076410 18864400 29382 15160928 20905038
#> max.nodes duplications tot.duplications samples
#> 214708592 18988 9797740 516
## Version 2.5.0
v250 <- summaryRprof("Version250.out", memory = "stats", index = 1)
v250
#> index: "tryCatch"
#> vsize.small max.vsize.small vsize.large max.vsize.large nodes
#> 3320297 25027376 257230 15691032 15623937
#> max.nodes duplications tot.duplications samples
#> 176450792 2 126 61
```
With verison `2.5.0+` there are only 126 `tot.duplications` whereas with version `2.4.3` there are millions of `tot.duplications`. In fact, there are a total of 9797740 duplications with version `2.4.3`. This together with `comboCount(25, 10) = 3,268,760` implies that the C funciton, `duplicate`, is called about 3 times per iteration with older versions (i.e. 9797740 `/ 3268760 ~=` 2.9974).
### Iterating over Partitions and Compositions of a Number
For most partition cases, we have all of the capabilities of the standard `comboIter` and `permuteIter` except for bidirectionality (i.e. the `prevIter` methods). For cases involving standard multisets we also don’t have random access methods.
``` r
## Similar illustration of comboIter(5, 3) at the top
p = partitionsIter(16, 4)
p@nextIter()
#> [1] 1 2 3 10
p@nextIter()
#> [1] 1 2 4 9
iter = p@currIter()
i = 1
while (!is.null(iter)) {
cat(i, " ", iter, "\n")
iter = p@nextIter()
i = i + 1
}
#> 1 1 2 4 9
#> 2 1 2 5 8
#> 3 1 2 6 7
#> 4 1 3 4 8
#> 5 1 3 5 7
#> 6 1 4 5 6
#> 7 2 3 4 7
#> 8 2 3 5 6
#> No more results.
partitionsGeneral(16, 4, lower = 2)
#> [,1] [,2] [,3] [,4]
#> [1,] 1 2 4 9
#> [2,] 1 2 5 8
#> [3,] 1 2 6 7
#> [4,] 1 3 4 8
#> [5,] 1 3 5 7
#> [6,] 1 4 5 6
#> [7,] 2 3 4 7
#> [8,] 2 3 5 6
p@summary()
#> $description
#> [1] "Partitions of 16 into 4 parts"
#>
#> $currentIndex
#> [1] 10
#>
#> $totalResults
#> [1] 9
#>
#> $totalRemaining
#> [1] -1
## Using random access
p[[7]]
#> [1] 1 4 5 6
## No previous iterators
p@prevIter()
#> Error in eval(expr, envir, enclos): no slot of name "prevIter" for this object of class "Partitions"
```
For compositions, the options are limited to a subset of compositions with repetition.
``` r
## Similar illustration of comboIter(5, 3) at the top
p = compositionsIter(6, 3, TRUE)
p@nextIter()
#> [1] 1 1 4
p@nextIter()
#> [1] 1 2 3
iter = p@currIter()
i = 1
while (!is.null(iter)) {
cat(i, " ", iter, "\n")
iter = p@nextIter()
i = i + 1
}
#> 1 1 2 3
#> 2 1 3 2
#> 3 1 4 1
#> 4 2 1 3
#> 5 2 2 2
#> 6 2 3 1
#> 7 3 1 2
#> 8 3 2 1
#> 9 4 1 1
#> No more results.
compositionsGeneral(6, 3, TRUE, lower = 2)
#> [,1] [,2] [,3]
#> [1,] 1 2 3
#> [2,] 1 3 2
#> [3,] 1 4 1
#> [4,] 2 1 3
#> [5,] 2 2 2
#> [6,] 2 3 1
#> [7,] 3 1 2
#> [8,] 3 2 1
#> [9,] 4 1 1
p@summary()
#> $description
#> [1] "Compositions with repetition of 6 into 3 parts"
#>
#> $currentIndex
#> [1] 11
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] -1
## Using random access
p[[7]]
#> [1] 2 3 1
## No previous iterators
p@prevIter()
#> Error in eval(expr, envir, enclos): no slot of name "prevIter" for this object of class "Partitions"
```
### Iterating over Constrained Combinations/Permutations
Now, the combinatorial iterators have all of the features of their “general” analogs (I.e. `{combo|permute|partitions|compositions}General`), which includes constrained results.
For general constrained cases, these iterators offer huge advantages over their “general” counterparts. Previously, one had to guess how many results there would be using the `upper` parameter as executing the function with no constraints meant the user could be waiting for a while or consume a large amount of resources.
Another drawback is that it difficult to start generating from a particular point. With the “general” functions, if the `lower` parameter is used, we have to make a decision in order to disambiguate the use. Without constraints, using `lower` is easy to understand. It simply means to start generating results starting at a particular lexicographical result, which we can do efficiently (i.e. no need to generate the first `lower - 1` results). With constraints, it could mean one of two things:
1. Start checking from a particular lexicographical result without considering the constraint (as we do normally).
2. Start generating results from a particular result with regards to the final constrained output.
In `RcppAlgos` we have always used the first interpretation. A big downside for the second point is that we don’t have any fast algorithms for enumerating the total number of results, which reduces determining the *nth* result to a brute force approach.
With iterators, we can generate *n* results with `nextNIter(n)` or calling `nextIter()` *n* times (or some combination of the two). Then, if we want to continue iterating, we pick up where we left off fetching the *(n + 1)th* result and beyond (if there are any results left). This allows us to keep memory low without sacrificing our current state.
``` r
set.seed(55)
s = runif(10, -5, 5)
print(s)
#> [1] 0.478135161 -2.818403214 -4.650360052 2.915492940 0.602420762 -4.257748260
#> [7] -3.684770642 -2.058761222 0.007612633 -4.116755421
## Using comboGeneral to retrieve all results
comboGeneral(s, 5, constraintFun = "mean",
comparisonFun = "<", limitConstraints = -3)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -4.650360 -4.257748 -4.116755 -3.684771 -2.818403214
#> [2,] -4.650360 -4.257748 -4.116755 -3.684771 -2.058761222
#> [3,] -4.650360 -4.257748 -4.116755 -3.684771 0.007612633
#> [4,] -4.650360 -4.257748 -4.116755 -3.684771 0.478135161
#> [5,] -4.650360 -4.257748 -4.116755 -3.684771 0.602420762
#> [6,] -4.650360 -4.257748 -4.116755 -2.818403 -2.058761222
#> [7,] -4.650360 -4.257748 -4.116755 -2.818403 0.007612633
#> [8,] -4.650360 -4.257748 -4.116755 -2.818403 0.478135161
#> [9,] -4.650360 -4.257748 -4.116755 -2.818403 0.602420762
#> [10,] -4.650360 -4.257748 -4.116755 -2.058761 0.007612633
#> [11,] -4.650360 -4.257748 -3.684771 -2.818403 -2.058761222
#> [12,] -4.650360 -4.257748 -3.684771 -2.818403 0.007612633
#> [13,] -4.650360 -4.116755 -3.684771 -2.818403 -2.058761222
#> [14,] -4.650360 -4.116755 -3.684771 -2.818403 0.007612633
#> [15,] -4.257748 -4.116755 -3.684771 -2.818403 -2.058761222
## Using comboIter
a = comboIter(s, 5, constraintFun = "mean",
comparisonFun = "<", limitConstraints = -3)
## See the first result
a@nextIter()
#> [1] -4.650360 -4.257748 -4.116755 -3.684771 -2.818403
## Get the next three
a@nextNIter(3)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -4.65036 -4.257748 -4.116755 -3.684771 -2.058761222
#> [2,] -4.65036 -4.257748 -4.116755 -3.684771 0.007612633
#> [3,] -4.65036 -4.257748 -4.116755 -3.684771 0.478135161
## See the summary... Note the totalResults and totalRemaining
## fields are NA as we are not able to calculate this upfront.
a@summary()
#> $description
#> [1] "Combinations of 10 choose 5 where the mean is < -3"
#>
#> $currentIndex
#> [1] 4
#>
#> $totalResults
#> [1] NA
#>
#> $totalRemaining
#> [1] NA
a@nextNIter(3)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -4.65036 -4.257748 -4.116755 -3.684771 0.602420762
#> [2,] -4.65036 -4.257748 -4.116755 -2.818403 -2.058761222
#> [3,] -4.65036 -4.257748 -4.116755 -2.818403 0.007612633
## Get the rest
a@nextRemaining()
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -4.650360 -4.257748 -4.116755 -2.818403 0.478135161
#> [2,] -4.650360 -4.257748 -4.116755 -2.818403 0.602420762
#> [3,] -4.650360 -4.257748 -4.116755 -2.058761 0.007612633
#> [4,] -4.650360 -4.257748 -3.684771 -2.818403 -2.058761222
#> [5,] -4.650360 -4.257748 -3.684771 -2.818403 0.007612633
#> [6,] -4.650360 -4.116755 -3.684771 -2.818403 -2.058761222
#> [7,] -4.650360 -4.116755 -3.684771 -2.818403 0.007612633
#> [8,] -4.257748 -4.116755 -3.684771 -2.818403 -2.058761222
```
They are very efficient as well. Consider the example below where we use `comboGeneral` to generate all results without capping the output. Again, we are in a situation where we don’t know *a priori* how many results we will obtain.
``` r
set.seed(77)
s = runif(50, 20, 100)
## Over one trillion results to sift through
comboCount(s, 15)
#> [1] 2.25083e+12
time_all <- system.time({
print(
nrow(
comboGeneral(s, 15,
constraintFun = "mean",
comparisonFun = ">",
limitConstraints = 83)
)
)
})
#> [1] 38935252
time_all
#> user system elapsed
#> 2.014 1.383 3.872
## Over 4 GBs of results
(38935252 * 15 * 8) / 2^30
#> [1] 4.351353
```
Just over 3 seconds isn’t bad, however 4 GBs could put a strain on your computer.
Let’s use iterators instead and only generate ten thousand at a time to keep memory low. We should mention here that the iterators are “smart” in that there is no fear in requesting more results than what is actually left. For example, if in the problem above, we had iterated to the 38th million result and requested 10 million more, we would only obtain 935,252 results.
``` r
invisible(gc())
time_iter <- system.time({
a = comboIter(s, 15,
constraintFun = "mean",
comparisonFun = ">",
limitConstraints = 83)
while (!is.null(a@nextNIter(1e4))) {}
print(a@summary())
})
#> No more results.
#>
#> $description
#> [1] "Combinations of 50 choose 15 where the mean is > 83"
#>
#> $currentIndex
#> [1] 38935252
#>
#> $totalResults
#> [1] NA
#>
#> $totalRemaining
#> [1] NA
time_iter
#> user system elapsed
#> 1.637 0.339 1.976
## Only 1 MBs per iteration
(1e4 * 15 * 8) / 2^20
#> [1] 1.144409
```
Wow! Using the iterator approach is not only easier on your RAM, but faster as well (3.872 `/` 1.976 `~=` 1.9595)! Our gains came strictly from memory efficiency (From over 4 GBs to just over 1 MB) as the underlying algorithm is exactly the same.
Lastly, using iterators make some problems possible that would otherwise be intractable because of hardware. For instance, using the example above, if we changed the `limitConstraints` from 83 to 81 and tried the first approach your computer will most certainly become unusable (at least mine did). My memory usage shot up to over 30 GB and R became unresponsive. After a restart, I tried the second approach and obtained my result in just over 10 seconds barely noticing any jumps in memory:
``` r
## Don't run... consumes a huge chunk of memory
# time_all <- system.time({
# print(
# nrow(
# comboGeneral(s, 15,
# constraintFun = "mean",
# comparisonFun = ">",
# limitConstraints = 81) ## 83 -->> 81
# )
# )
# })
## No problem with iterators
invisible(gc())
system.time({
a = comboIter(s, 15,
constraintFun = "mean",
comparisonFun = ">",
limitConstraints = 81) ## 83 -->> 81
while (!is.null(a@nextNIter(1e4))) {}
print(a@summary())
})
#> No more results.
#>
#> $description
#> [1] "Combinations of 50 choose 15 where the mean is > 81"
#>
#> $currentIndex
#> [1] 271309888
#>
#> $totalResults
#> [1] NA
#>
#> $totalRemaining
#> [1] NA
#> user system elapsed
#> 11.449 2.382 13.833
```
### Iterating over Partitions of Groups
As of version `2.8.2`, we can iterate over partitions of groups with `comboGroupsIter`.
Just as with `partitionsIter`, we have all of the capabilities of the standard `comboIter` and `permuteIter` except for bidirectionality (i.e. the `prevIter` methods).
``` r
## Similar illustration of comboIter(5, 3) at the top
cg = comboGroupsIter(6, 2, retType = "3Darray")
cg@nextIter()
#> Grp1 Grp2
#> [1,] 1 4
#> [2,] 2 5
#> [3,] 3 6
cg@nextIter()
#> Grp1 Grp2
#> [1,] 1 3
#> [2,] 2 5
#> [3,] 4 6
iter = cg@currIter()
i = 1
while (!is.null(iter)) {
cat("\n ", i, "-------------\n")
print(iter)
iter = cg@nextIter()
i = i + 1
}
#>
#> 1 -------------
#> Grp1 Grp2
#> [1,] 1 3
#> [2,] 2 5
#> [3,] 4 6
#>
#> 2 -------------
#> Grp1 Grp2
#> [1,] 1 3
#> [2,] 2 4
#> [3,] 5 6
#>
#> 3 -------------
#> Grp1 Grp2
#> [1,] 1 3
#> [2,] 2 4
#> [3,] 6 5
#>
#> 4 -------------
#> Grp1 Grp2
#> [1,] 1 2
#> [2,] 3 5
#> [3,] 4 6
#>
#> 5 -------------
#> Grp1 Grp2
#> [1,] 1 2
#> [2,] 3 4
#> [3,] 5 6
#>
#> 6 -------------
#> Grp1 Grp2
#> [1,] 1 2
#> [2,] 3 4
#> [3,] 6 5
#>
#> 7 -------------
#> Grp1 Grp2
#> [1,] 1 2
#> [2,] 4 3
#> [3,] 5 6
#>
#> 8 -------------
#> Grp1 Grp2
#> [1,] 1 2
#> [2,] 4 3
#> [3,] 6 5
#>
#> 9 -------------
#> Grp1 Grp2
#> [1,] 1 2
#> [2,] 5 3
#> [3,] 6 4
#> No more results.
comboGroups(6, 2, retType = "3Darray", lower = 2)
#> , , Grp1
#>
#> [,1] [,2] [,3]
#> [1,] 1 2 4
#> [2,] 1 2 5
#> [3,] 1 2 6
#> [4,] 1 3 4
#> [5,] 1 3 5
#> [6,] 1 3 6
#> [7,] 1 4 5
#> [8,] 1 4 6
#> [9,] 1 5 6
#>
#> , , Grp2
#>
#> [,1] [,2] [,3]
#> [1,] 3 5 6
#> [2,] 3 4 6
#> [3,] 3 4 5
#> [4,] 2 5 6
#> [5,] 2 4 6
#> [6,] 2 4 5
#> [7,] 2 3 6
#> [8,] 2 3 5
#> [9,] 2 3 4
cg@summary()
#> $description
#> [1] "Partition of v of length 6 into 2 uniform groups"
#>
#> $currentIndex
#> [1] 11
#>
#> $totalResults
#> [1] 10
#>
#> $totalRemaining
#> [1] -1
## Using random access
cg[[7]]
#> Grp1 Grp2
#> [1,] 1 2
#> [2,] 3 4
#> [3,] 6 5
## No previous iterators
cg@prevIter()
#> Error in eval(expr, envir, enclos): no slot of name "prevIter" for this object of class "ComboGroups"
```