R递归函数或循环内循环

Question

简单的问题。 我想检查两个点（i，j）的差是否大于阈值（diff）。 如果点之间的差值超过阈值，则应返回索引，并从新数据点开始测量下一个距离。 这是一个简单的截止过滤器，其中对预定义阈值以下的所有数据点进行过滤。 唯一的技巧是，始终从“最后一个”点（距离之前的点“足够远”）进行测量。

我首先将其编写为两个嵌套循环，例如：

x <- sample(1:100)
for(i in 1:(length(x)-1)){
      for(j in (i+1):length(x)){
        if(abs(x[i] - x[j]) >= cutoff) { 
          print(j)
          i <- j  # set the index to the current datapoint
          break }
      }}

这种解决方案很直观。 但是不能正常工作。 我认为i和j的分配无效。 第一个循环只是忽略跳转，并循环遍历所有数据点。

好吧，我不想浪费时间进行调试，只是想通过递归函数可以做到这一点。 所以我这样写：

checkCutOff.f <- function(x,cutoff,i = 1) {
  options(expressions=500000)
  # Loops through the data and comperes the temporally fixed point 'i with the looping points 'j
  for(j in (i+1):length(x)){
    if( abs(x[i] - x[j]) >= cutoff ){
      break
    }
  }

  # Recursive function to update the new 'i - stops at the end of the dataset
  if( j<length(x) ) return(c(j,checkCutOff.f(x,cutoff,j))) 
  else return(j)
}
 x<-sample(1:100000)
 checkCutOff.f(x,1)

此代码有效。 但是我得到了大数据集的堆栈溢出。 这就是为什么我问自己这段代码是否有效。 对我来说，限制越来越大，等等。总是提示低效的代码...

所以我的问题是： 哪种解决方案真正有效？ 谢谢！

Answer 1

您应该避免使用c增加返回值。 效率低下。 最后分配最大大小，最后分配子集到所需大小。

请注意，您的函数始终在结果中包含length(x) ，这是错误的：

set.seed(42)
x<-sample(1:10)
checkCutOff.f(x, 100)
#[1] 10

这是带有循环的R解决方案：

checkCutOff.f1 <- function(x,cutoff) {
  i <- 1
  j <- 1
  k <- 1

  result <- integer(length(x))

  while(j < length(x)) {
    j <- j + 1
    if (abs(x[i] - x[j]) >= cutoff) {
      result[k] <- j
      k <- k + 1
      i <- j
    }
  }
  result[seq_len(k - 1)]
}

all.equal(checkCutOff.f(x, 4), checkCutOff.f1(x, 4))
#[1] TRUE
#the correct solution includes length(x)  here (by chance)

转换为Rcpp很容易：

#include <Rcpp.h>
using namespace Rcpp;


// [[Rcpp::export]]
IntegerVector checkCutOff_f1cpp(NumericVector x, double cutoff) {
  int i = 0; 
  int j = 1; 
  int k = 0;
  IntegerVector result(x.size());  
  while(j < x.size()) {
    if (std::abs(x[i] - x[j]) >= cutoff) {
      result[k] = j + 1;
      k++;
      i = j;
    }
    j++;
  }
  result = result[seq_len(k)-1];
  return result;
}

然后在R中：

all.equal(checkCutOff.f(x, 4), checkCutOff_f1cpp(x, 4))
#[1] TRUE

基准：

library(microbenchmark)
y <- sample(1:1000)

microbenchmark(
  checkCutOff.f(y, 4),
  checkCutOff.f1(y, 4),
  checkCutOff_f1cpp(y, 4)
  )

#Unit: microseconds
#                    expr      min        lq       mean   median        uq       max neval cld
#     checkCutOff.f(y, 4) 3665.105 4681.6005 7798.41776 5323.068 6635.9205 41028.930   100   c
#    checkCutOff.f1(y, 4) 1384.524 1507.2635 1831.43236 1769.031 2070.7225  3012.279   100  b 
# checkCutOff_f1cpp(y, 4)    8.765   10.7035   26.40709   14.240   18.0005   587.958   100 a

我相信这可以进一步改善，应该做更多的测试。

R递归函数或循环内循环

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-05-16 15:27:17

R递归函数或循环内循环

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-05-16 15:27:17

解决方案1
2 已采纳 2015-05-16 15:27:17