简体   繁体   English

R递归函数或循环内循环

[英]R recursive function or loop in loop

simple problem. 简单的问题。 I want to check if the difference of two points (i, j) is greater than a threshold (diff). 我想检查两个点(i,j)的差是否大于阈值(diff)。 If the difference between the points exceeds the threshold the index should be returned and the next distance is measured but from the new datapoint. 如果点之间的差值超过阈值,则应返回索引,并从新数据点开始测量下一个距离。 It is a simple cutofffilter where all datapoints under a predefined threshold are filtered. 这是一个简单的截止过滤器,其中对预定义阈值以下的所有数据点进行过滤。 The only trick is, that the measurement is performed from always the "last" point (that was "far enough away" from the point before). 唯一的技巧是,始终从“最后一个”点(距离之前的点“足够远”)进行测量。

I first wrote it as two nested loops like: 我首先将其编写为两个嵌套循环,例如:

x <- sample(1:100)
for(i in 1:(length(x)-1)){
      for(j in (i+1):length(x)){
        if(abs(x[i] - x[j]) >= cutoff) { 
          print(j)
          i <- j  # set the index to the current datapoint
          break }
      }}

This solution is kind of intuitive. 这种解决方案很直观。 But does not work proper. 但是不能正常工作。 I think the assignment of i and j is not valid. 我认为i和j的分配无效。 The first loop just ignores to jump and loops through all datapoints. 第一个循环只是忽略跳转,并循环遍历所有数据点。

Well, I did not want to waste time with debugging and just thought I can do the same with a recursive function. 好吧,我不想浪费时间进行调试,只是想通过递归函数可以做到这一点。 So I wrote it like: 所以我这样写:

checkCutOff.f <- function(x,cutoff,i = 1) {
  options(expressions=500000)
  # Loops through the data and comperes the temporally fixed point 'i with the looping points 'j
  for(j in (i+1):length(x)){
    if( abs(x[i] - x[j]) >= cutoff ){
      break
    }
  }

  # Recursive function to update the new 'i - stops at the end of the dataset
  if( j<length(x) ) return(c(j,checkCutOff.f(x,cutoff,j))) 
  else return(j)
}
 x<-sample(1:100000)
 checkCutOff.f(x,1)

This code works. 此代码有效。 But I get a stack overflow with big datasets. 但是我得到了大数据集的堆栈溢出。 That's why I ask myself if this code is efficient. 这就是为什么我问自己这段代码是否有效。 For me is increasing limits etc. always a hint for inefficient code... 对我来说,限制越来越大,等等。总是提示低效的代码...

So my question is: What kind of solution is really efficient? 所以我的问题是: 哪种解决方案真正有效? Thanks! 谢谢!

You should avoid growing your return value with c . 您应该避免使用c增加返回值。 That's inefficient. 效率低下。 Allocate to the maximum size and subset to the needed size in the end. 最后分配最大大小,最后分配子集到所需大小。

Note that your function always includes length(x) in your result, which is wrong: 请注意,您的函数始终在结果中包含length(x) ,这是错误的:

set.seed(42)
x<-sample(1:10)
checkCutOff.f(x, 100)
#[1] 10

Here is an R solution with a loop: 这是带有循环的R解决方案:

checkCutOff.f1 <- function(x,cutoff) {
  i <- 1
  j <- 1
  k <- 1

  result <- integer(length(x))

  while(j < length(x)) {
    j <- j + 1
    if (abs(x[i] - x[j]) >= cutoff) {
      result[k] <- j
      k <- k + 1
      i <- j
    }
  }
  result[seq_len(k - 1)]
}

all.equal(checkCutOff.f(x, 4), checkCutOff.f1(x, 4))
#[1] TRUE
#the correct solution includes length(x)  here (by chance)

It's easy to translate to Rcpp: 转换为Rcpp很容易:

#include <Rcpp.h>
using namespace Rcpp;


// [[Rcpp::export]]
IntegerVector checkCutOff_f1cpp(NumericVector x, double cutoff) {
  int i = 0; 
  int j = 1; 
  int k = 0;
  IntegerVector result(x.size());  
  while(j < x.size()) {
    if (std::abs(x[i] - x[j]) >= cutoff) {
      result[k] = j + 1;
      k++;
      i = j;
    }
    j++;
  }
  result = result[seq_len(k)-1];
  return result;
}

Then in R: 然后在R中:

all.equal(checkCutOff.f(x, 4), checkCutOff_f1cpp(x, 4))
#[1] TRUE

Benchmarks: 基准:

library(microbenchmark)
y <- sample(1:1000)

microbenchmark(
  checkCutOff.f(y, 4),
  checkCutOff.f1(y, 4),
  checkCutOff_f1cpp(y, 4)
  )

#Unit: microseconds
#                    expr      min        lq       mean   median        uq       max neval cld
#     checkCutOff.f(y, 4) 3665.105 4681.6005 7798.41776 5323.068 6635.9205 41028.930   100   c
#    checkCutOff.f1(y, 4) 1384.524 1507.2635 1831.43236 1769.031 2070.7225  3012.279   100  b 
# checkCutOff_f1cpp(y, 4)    8.765   10.7035   26.40709   14.240   18.0005   587.958   100 a

I'm sure this can be improved further and more testing should be done. 我相信这可以进一步改善,应该做更多的测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM