简体   繁体   English

使用R找到所有可能的3个数字组合,其中sum小于给定数量

[英]find all possible 3 number combinations where sum is less than a given number using R

I have following set of numbers 10,17,5,7,15. 我有以下数字10,17,5,7,15。 From these numbers, I need to find the all possible 3 number combinations where the sum is less than or equal to 35. within a one such combination , a specific number should not contain more than once. 从这些数字中,我需要找到所有可能的3个数字组合,其中总和小于或等于35.在一个这样的组合中,特定数字不应包含多于一次。 Ex : 10 ,10 ,5 is an incorrect combination since 10 repeated twice. 例如:10,10,5是不正确的组合,因为10次重复两次。

I tried this code but its not giving what i need. 我试过这个代码,但它没有给出我需要的东西。

library(data.table)
df=expand.grid(x1=c(10,17,5,7,15),
               x2=c(10,17,5,7,15),
               x3=c(10,17,5,7,15)
               )
setDT(df)
df[(x1+x2+x3) <= 35]

A part of the output of the above code as follows, 以上代码输出的一部分如下,

  x1 x2 x3
 1: 10 10 10
 2:  5 10 10
 3:  7 10 10
 4: 15 10 10
 5:  5 17 10
 6:  7 17 10
 7: 10  5 10

based on the above output it can be observed that one number appears more than once. 基于上述输出,可以观察到一个数字出现不止一次。 can anyone suggest a hint to get the desired results ? 任何人都可以建议一个提示,以获得理想的结果?

thank you 谢谢

Try the following to see if it's what the question asks for. 请尝试以下方法,看看问题是否正确。

x <- c(10,17,5,7,15)
i <- combn(x, 3, sum) <= 35

combn(x, 3)[, i]
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#[1,]   10   10   10   10   10   17    5
#[2,]   17   17    5    5    7    5    7
#[3,]    5    7    7   15   15    7   15

The above is the general idea. 以上是一般的想法。 A more efficient implementation, both memory and speed wise, is f2 below. 内存和速度方面的更有效的实现方式是f2以下。

f1 <- function(x, n = 3, thres = 35){
  i <- combn(x, n, sum) <= thres
  combn(x, n)[, i]
}
f2 <- function(x, n = 3, thres = 35){
  cmb <- combn(x, n)
  cmb[, colSums(cmb) <= thres]
}

Check if the results are all with different numbers. 检查结果是否都有不同的数字。

res <- f2(x)
apply(res, 2, function(y){
  all(y[-1] != y[1])
})
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE

identical(f1(x), f2(x))
#[1] TRUE

Now time the functions. 现在时间功能。

microbenchmark::microbenchmark(f1 = f1(x), 
                               f2 = f2(x))
#Unit: microseconds
# expr     min      lq      mean   median      uq     max neval cld
#   f1 105.150 107.383 110.66616 108.6535 109.896 238.899   100   b
#   f2  62.779  65.568  67.65754  66.4290  67.145 122.119   100  a 

The function comboGeneral from the package RcppAlgos (I am the author) was designed specifically for this task. RcppAlgos (我是作者)中的函数comboGeneral专门为此任务而设计。

library(RcppAlgos)
x <- c(10,17,5,7,15)

comboGeneral(x, 3, 
             constraintFun = "sum",
             comparisonFun = "<=",
             limitConstraints = 35)
     [,1] [,2] [,3]
[1,]    5    7   10
[2,]    5    7   15
[3,]    5    7   17
[4,]    5   10   15
[5,]    5   10   17
[6,]    7   10   15
[7,]    7   10   17

It is very efficient as well. 它也非常有效。 Observe: 注意:

set.seed(42)
s <- sample(100, 25)
s
[1] 92 93 29 81 62 50 70 13 61 65 42 91 83 23 40 80 88 10 39 46 73 11 78 85  7

system.time(a <- comboGeneral(s, 10, 
                              constraintFun = "sum",
                              comparisonFun = "<=",
                              limitConstraints = 600))
 user  system elapsed 
0.232   0.046   0.278

dim(a)
[1] 2252362      10

Compared to the more efficient function f2 posted by @RuiBarradas and dt_checker by @Cole: 与@RuiBarradas和dt_checker发布的dt_checker更有效的函数f2相比:

system.time(b <- f2(s, 10, 600))
 user  system elapsed 
3.283   0.093   3.418

system.time(a2 <- dt_checker(s, 10, 600))
 user  system elapsed 
1.803   0.319   0.646

It should also be noted that the algorithm behind comboGeneral terminates as soon as a solution can longer be obtained. 还应注意,只要可以更长时间地获得解, comboGeneral背后的算法comboGeneral终止。 Consequently, the timings will be different with different constraints. 因此,时间将因不同的约束而不同。 Observe: 注意:

system.time(a <- comboGeneral(s, 10, 
                              constraintFun = "sum",
                              comparisonFun = "<=",
                              limitConstraints = 400))
 user  system elapsed 
0.003   0.001   0.003

However, with the other solutions, all combinations must be created and then filtered (which doesn't take as long), thus the timings are similar to before. 但是,对于其他解决方案,必须创建所有组合然后进行过滤(这不会花费很长时间),因此时间与之前类似。

system.time(b <- f2(s, 10, 400))
 user  system elapsed 
2.933   0.039   2.973

system.time(a2 <- dt_checker(s, 10, 400))
 user  system elapsed 
1.786   0.276   0.627

As a final benchmark, we benchmark finding all results on multiple restraints: 作为最终基准,我们对多个限制条件下的所有结果进行基准测试:

system.time(a <- lapply(seq(200, 600, 25), function(x) {
    t <- comboGeneral(s, 10, 
                      constraintFun = "sum",
                      comparisonFun = "<=",
                      limitConstraints = x)
    dim(t)
}))
 user  system elapsed 
0.498   0.125   0.623

system.time(a2 <- lapply(seq(200, 600, 25), function(x) {
    t <- dt_checker(s, 10, x)
    dim(t)
}))
  user  system elapsed 
34.448   4.633  10.693

identical(a, a2)
[1] TRUE

We can remove rows with any duplicated value and then select rows with sum <= 35 我们可以删除any duplicated值的行,然后选择sum <= 35

df1 <- df[!apply(df, 1, function(x) any(duplicated(x))), ]
df1[rowSums(df1) <= 35, ]

#    x1 x2 x3
#8    5 17 10
#9    7 17 10
#12  17  8 10
#13   5  8 10
#14   7  8 10

Original df in OP's code has all possible combinations of c(10,17,5,7,15) with lot of repetitions. OP代码中的原始df具有c(10,17,5,7,15)所有可能组合,具有大量重复。 Using the apply loop we remove any rows with repeated values. 使用apply循环,我们删除任何具有重复值的行。 So a row with 10, 10 would get deleted and same with 17, 17 and other repetitions. 因此,10,10的行将被删除,并且与17,17和其他重复相同。 df1 is the dataframe without repeating numbers. df1是没有重复数字的数据帧。 Now we subset only those rows whose sum is less than equal to 35. 现在我们只对那些总和小于等于35的行进行子集化。

You may not want to do this with more columns, but this works simply: 您可能不希望使用更多列来执行此操作,但这只是:

df[(x1+x2+x3) <= 35 & x1 != x2 & x2 != x3 & x3 != x1] 

and if you think 10,17,5 is the same as 5,10,17 so they need only be kept once, then: 如果您认为10,17,5与5,10,17相同,那么它们只需要保留一次,那么:

df[(x1+x2+x3) <= 35 & x1 < x2 & x2 < x3] 

Here's an answer that relies on data.table 's non-equi joins. 这是一个依赖于data.table的非equi连接的答案。 Most of the time is spent manipulating character vectors in order to evaluate in the dt call. 大部分时间用于操纵字符向量以便在dt调用中进行评估。

library(data.table)

dt_checker <- function(y, n, criteria) {
  x_dt <- data.table(x1 = y)
  setkey(x_dt, x1)

  x_res <- copy(x_dt)[seq_len(length(y)-(n-1))]


  for (i in seq_len(n)[-1]) {
    setnames(x_dt, paste0('x', i))

    cols <- paste0('x', seq_len(i))
    cols2 <- cols
    cols2[i-1] <- paste0('x.', cols2[i-1])

    x_res <- x_res[x_dt, on = paste(cols[c(i-1, i)], collapse = '<'), ..cols2, allow.cartesian = T, nomatch = 0L]
    setnames(x_res, cols)
  }

  x_res[x_res[, rowSums(.SD)<= criteria] ,]
}

dt_checker(x, 3, 35)
   x1 x2 x3
1:  5  7 10
2:  5  7 15
3:  5 10 15
4:  7 10 15
5:  5  7 17
6:  5 10 17
7:  7 10 17

I mainly did it to see if I could get data.table faster than the RcppAlgos solution. 我主要是为了看看我是否能比RcppAlgos解决方案更快地得到data.table I couldn't and in the time I spent figuring out the logic to automate all of these joins, I probably could have figured it out in Rcpp :). 我不能并且在我花时间计算自动化所有这些连接的逻辑时,我可能已经在RcppRcpp它:)。

system.time(a <- comboGeneral(s, 10, 
+                               constraintFun = "sum",
+                               comparisonFun = "<=",
+                               limitConstraints = 600))
   user  system elapsed 
   0.10    0.13    0.23 
system.time(a2 <- dt_checker(s, 10, 600))
   user  system elapsed 
   0.54    0.09    0.57 
system.time(a3 <- f2(s, 10, 600))
   user  system elapsed 
   3.98    0.00    4.01 

Also, for smaller datasets, this would work as well. 此外,对于较小的数据集,这也可以。 But for smaller datasets, @Rui's solution is almost as fast as the RcppAlgos and it's a base solution. 但对于较小的数据集,@ Rui的解决方案几乎和RcppAlgos一样快,它是一个基础解决方案。

dt_CJ <- function(y, n, criteria) {
  x <- sort(y)
  dt_res <- do.call(CJ, lapply(seq(1, length(x) - (n-1)), function(i) x[i:(i+n-1)]))

  eval_crit <- paste0(lapply(1:(n-1), function(i) paste0('V', i:(i+1), collapse = '<')), collapse = '&')
  dt_res[eval(parse(text = eval_crit)), .SD[rowSums(.SD) <= criteria]][]
}

Unit: microseconds
                  expr      min       lq      mean   median       uq      max neval
 dt_checker(x, 3, 600) 5074.100 5075.601 5333.7608 5139.000 5322.201 6057.902     5
      dt_CJ(x, 3, 600) 2593.001 2662.801 2703.7010 2670.901 2770.101 2821.701     5
         f2(x, 3, 600)   72.601   76.001   90.7412   79.101   81.702  144.301     5
              comboGen   45.000   47.501   69.4604   58.701   69.000  127.100     5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM