[英]find all possible 3 number combinations where sum is less than a given number using R
I have following set of numbers 10,17,5,7,15. 我有以下数字10,17,5,7,15。 From these numbers, I need to find the all possible 3 number combinations where the sum is less than or equal to 35. within a one such combination , a specific number should not contain more than once. 从这些数字中,我需要找到所有可能的3个数字组合,其中总和小于或等于35.在一个这样的组合中,特定数字不应包含多于一次。 Ex : 10 ,10 ,5 is an incorrect combination since 10 repeated twice. 例如:10,10,5是不正确的组合,因为10次重复两次。
I tried this code but its not giving what i need. 我试过这个代码,但它没有给出我需要的东西。
library(data.table)
df=expand.grid(x1=c(10,17,5,7,15),
x2=c(10,17,5,7,15),
x3=c(10,17,5,7,15)
)
setDT(df)
df[(x1+x2+x3) <= 35]
A part of the output of the above code as follows, 以上代码输出的一部分如下,
x1 x2 x3
1: 10 10 10
2: 5 10 10
3: 7 10 10
4: 15 10 10
5: 5 17 10
6: 7 17 10
7: 10 5 10
based on the above output it can be observed that one number appears more than once. 基于上述输出,可以观察到一个数字出现不止一次。 can anyone suggest a hint to get the desired results ? 任何人都可以建议一个提示,以获得理想的结果?
thank you 谢谢
Try the following to see if it's what the question asks for. 请尝试以下方法,看看问题是否正确。
x <- c(10,17,5,7,15)
i <- combn(x, 3, sum) <= 35
combn(x, 3)[, i]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#[1,] 10 10 10 10 10 17 5
#[2,] 17 17 5 5 7 5 7
#[3,] 5 7 7 15 15 7 15
The above is the general idea. 以上是一般的想法。 A more efficient implementation, both memory and speed wise, is f2
below. 内存和速度方面的更有效的实现方式是f2
以下。
f1 <- function(x, n = 3, thres = 35){
i <- combn(x, n, sum) <= thres
combn(x, n)[, i]
}
f2 <- function(x, n = 3, thres = 35){
cmb <- combn(x, n)
cmb[, colSums(cmb) <= thres]
}
Check if the results are all with different numbers. 检查结果是否都有不同的数字。
res <- f2(x)
apply(res, 2, function(y){
all(y[-1] != y[1])
})
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
identical(f1(x), f2(x))
#[1] TRUE
Now time the functions. 现在时间功能。
microbenchmark::microbenchmark(f1 = f1(x),
f2 = f2(x))
#Unit: microseconds
# expr min lq mean median uq max neval cld
# f1 105.150 107.383 110.66616 108.6535 109.896 238.899 100 b
# f2 62.779 65.568 67.65754 66.4290 67.145 122.119 100 a
The function comboGeneral
from the package RcppAlgos
(I am the author) was designed specifically for this task. 包RcppAlgos
(我是作者)中的函数comboGeneral
专门为此任务而设计。
library(RcppAlgos)
x <- c(10,17,5,7,15)
comboGeneral(x, 3,
constraintFun = "sum",
comparisonFun = "<=",
limitConstraints = 35)
[,1] [,2] [,3]
[1,] 5 7 10
[2,] 5 7 15
[3,] 5 7 17
[4,] 5 10 15
[5,] 5 10 17
[6,] 7 10 15
[7,] 7 10 17
It is very efficient as well. 它也非常有效。 Observe: 注意:
set.seed(42)
s <- sample(100, 25)
s
[1] 92 93 29 81 62 50 70 13 61 65 42 91 83 23 40 80 88 10 39 46 73 11 78 85 7
system.time(a <- comboGeneral(s, 10,
constraintFun = "sum",
comparisonFun = "<=",
limitConstraints = 600))
user system elapsed
0.232 0.046 0.278
dim(a)
[1] 2252362 10
Compared to the more efficient function f2
posted by @RuiBarradas and dt_checker
by @Cole: 与@RuiBarradas和dt_checker
发布的dt_checker更有效的函数f2
相比:
system.time(b <- f2(s, 10, 600))
user system elapsed
3.283 0.093 3.418
system.time(a2 <- dt_checker(s, 10, 600))
user system elapsed
1.803 0.319 0.646
It should also be noted that the algorithm behind comboGeneral
terminates as soon as a solution can longer be obtained. 还应注意,只要可以更长时间地获得解, comboGeneral
背后的算法comboGeneral
终止。 Consequently, the timings will be different with different constraints. 因此,时间将因不同的约束而不同。 Observe: 注意:
system.time(a <- comboGeneral(s, 10,
constraintFun = "sum",
comparisonFun = "<=",
limitConstraints = 400))
user system elapsed
0.003 0.001 0.003
However, with the other solutions, all combinations must be created and then filtered (which doesn't take as long), thus the timings are similar to before. 但是,对于其他解决方案,必须创建所有组合然后进行过滤(这不会花费很长时间),因此时间与之前类似。
system.time(b <- f2(s, 10, 400))
user system elapsed
2.933 0.039 2.973
system.time(a2 <- dt_checker(s, 10, 400))
user system elapsed
1.786 0.276 0.627
As a final benchmark, we benchmark finding all results on multiple restraints: 作为最终基准,我们对多个限制条件下的所有结果进行基准测试:
system.time(a <- lapply(seq(200, 600, 25), function(x) {
t <- comboGeneral(s, 10,
constraintFun = "sum",
comparisonFun = "<=",
limitConstraints = x)
dim(t)
}))
user system elapsed
0.498 0.125 0.623
system.time(a2 <- lapply(seq(200, 600, 25), function(x) {
t <- dt_checker(s, 10, x)
dim(t)
}))
user system elapsed
34.448 4.633 10.693
identical(a, a2)
[1] TRUE
We can remove rows with any
duplicated
value and then select rows with sum <= 35
我们可以删除any
duplicated
值的行,然后选择sum <= 35
df1 <- df[!apply(df, 1, function(x) any(duplicated(x))), ]
df1[rowSums(df1) <= 35, ]
# x1 x2 x3
#8 5 17 10
#9 7 17 10
#12 17 8 10
#13 5 8 10
#14 7 8 10
Original df
in OP's code has all possible combinations of c(10,17,5,7,15)
with lot of repetitions. OP代码中的原始df
具有c(10,17,5,7,15)
所有可能组合,具有大量重复。 Using the apply
loop we remove any rows with repeated values. 使用apply
循环,我们删除任何具有重复值的行。 So a row with 10, 10 would get deleted and same with 17, 17 and other repetitions. 因此,10,10的行将被删除,并且与17,17和其他重复相同。 df1
is the dataframe without repeating numbers. df1
是没有重复数字的数据帧。 Now we subset only those rows whose sum is less than equal to 35. 现在我们只对那些总和小于等于35的行进行子集化。
You may not want to do this with more columns, but this works simply: 您可能不希望使用更多列来执行此操作,但这只是:
df[(x1+x2+x3) <= 35 & x1 != x2 & x2 != x3 & x3 != x1]
and if you think 10,17,5 is the same as 5,10,17 so they need only be kept once, then: 如果您认为10,17,5与5,10,17相同,那么它们只需要保留一次,那么:
df[(x1+x2+x3) <= 35 & x1 < x2 & x2 < x3]
Here's an answer that relies on data.table
's non-equi joins. 这是一个依赖于data.table
的非equi连接的答案。 Most of the time is spent manipulating character vectors in order to evaluate in the dt
call. 大部分时间用于操纵字符向量以便在dt
调用中进行评估。
library(data.table)
dt_checker <- function(y, n, criteria) {
x_dt <- data.table(x1 = y)
setkey(x_dt, x1)
x_res <- copy(x_dt)[seq_len(length(y)-(n-1))]
for (i in seq_len(n)[-1]) {
setnames(x_dt, paste0('x', i))
cols <- paste0('x', seq_len(i))
cols2 <- cols
cols2[i-1] <- paste0('x.', cols2[i-1])
x_res <- x_res[x_dt, on = paste(cols[c(i-1, i)], collapse = '<'), ..cols2, allow.cartesian = T, nomatch = 0L]
setnames(x_res, cols)
}
x_res[x_res[, rowSums(.SD)<= criteria] ,]
}
dt_checker(x, 3, 35)
x1 x2 x3
1: 5 7 10
2: 5 7 15
3: 5 10 15
4: 7 10 15
5: 5 7 17
6: 5 10 17
7: 7 10 17
I mainly did it to see if I could get data.table
faster than the RcppAlgos
solution. 我主要是为了看看我是否能比RcppAlgos
解决方案更快地得到data.table
。 I couldn't and in the time I spent figuring out the logic to automate all of these joins, I probably could have figured it out in Rcpp
:). 我不能并且在我花时间计算自动化所有这些连接的逻辑时,我可能已经在Rcpp
中Rcpp
它:)。
system.time(a <- comboGeneral(s, 10,
+ constraintFun = "sum",
+ comparisonFun = "<=",
+ limitConstraints = 600))
user system elapsed
0.10 0.13 0.23
system.time(a2 <- dt_checker(s, 10, 600))
user system elapsed
0.54 0.09 0.57
system.time(a3 <- f2(s, 10, 600))
user system elapsed
3.98 0.00 4.01
Also, for smaller datasets, this would work as well. 此外,对于较小的数据集,这也可以。 But for smaller datasets, @Rui's solution is almost as fast as the RcppAlgos
and it's a base solution. 但对于较小的数据集,@ Rui的解决方案几乎和RcppAlgos
一样快,它是一个基础解决方案。
dt_CJ <- function(y, n, criteria) {
x <- sort(y)
dt_res <- do.call(CJ, lapply(seq(1, length(x) - (n-1)), function(i) x[i:(i+n-1)]))
eval_crit <- paste0(lapply(1:(n-1), function(i) paste0('V', i:(i+1), collapse = '<')), collapse = '&')
dt_res[eval(parse(text = eval_crit)), .SD[rowSums(.SD) <= criteria]][]
}
Unit: microseconds
expr min lq mean median uq max neval
dt_checker(x, 3, 600) 5074.100 5075.601 5333.7608 5139.000 5322.201 6057.902 5
dt_CJ(x, 3, 600) 2593.001 2662.801 2703.7010 2670.901 2770.101 2821.701 5
f2(x, 3, 600) 72.601 76.001 90.7412 79.101 81.702 144.301 5
comboGen 45.000 47.501 69.4604 58.701 69.000 127.100 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.