繁体   English   中英

在R中从多个0和几个1的序列中仅选择0和前1?

[英]selecting only the 0s and the first 1 from a sequence of many 0s and few 1s in R?

我以这种方式有一个0和1的序列:

xx <- c(0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 
                    0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1)

我想选择0和前1。

结果应该是:

ans <- c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1)

什么是最快的方式? 在R

使用rle()来提取运行长度和价值观,做一些小手术,然后把游程编码向量“重新走到一起”使用inverse.rle()

rr <- rle(xx)
rr$lengths[rr$values==1] <- 1
inverse.rle(rr)
#  [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1

这是一种方式:

idx <- which(xx == 1)
pos <- which(diff(c(xx[1], idx)) == 1)
xx[-idx[pos]] # following Frank's suggestion
# [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1

没有rle:

xx[head(c(TRUE, (xx != 1)), -1) | (xx != 1)]
#[1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1

由于OP提到了速度,这里是一个基准:

josh = function(xx) {
  rr <- rle(xx)
  rr$lengths[rr$values==1] <- 1
  inverse.rle(rr)
}

arun = function(xx) {
  idx <- which(xx == 1)
  pos <- which(diff(c(xx[1], idx)) == 1)
  xx[setdiff(seq_along(xx), idx[pos])]
}

eddi = function(xx) {
  xx[head(c(TRUE, (xx != 1)), -1) | (xx != 1)]
}

simon = function(xx) {
    #  The body of the function is supplied in @SimonO101's answer
    first1(xx)
}

set.seed(1)
N = 1e6    
xx = sample(c(0,1), N, T)

library(microbenchmark)
bm <- microbenchmark(josh(xx), arun(xx), eddi(xx), simon(xx) , times = 25)
print( bm , digits = 2 , order = "median" )
#Unit: milliseconds
#      expr min  lq median  uq max neval
# simon(xx)  20  21     23  26  72    25
#  eddi(xx)  97 102    104 118 149    25
#  arun(xx) 205 245    253 258 332    25
#  josh(xx) 228 268    275 287 365    25

这是一个快速的Rcpp解决方案。 应该很快(但我不知道它会如何与其他人对抗)...

Rcpp::cppFunction( 'std::vector<int> first1( IntegerVector x ){
    std::vector<int> out;
    for( IntegerVector::iterator it = x.begin(); it != x.end(); ++it ){
        if( *it == 1 && *(it-1) != 1 || *it == 0  )
          out.push_back(*it);
    }
    return out;
}')

first1(xx)
# [1] 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1

即使是'我是rle的坚定支持者,因为这是星期五,这是另一种方法。 我这么做是为了好玩,所以YMMV。

yy<-paste(xx,collapse='')
zz<-gsub('[1]{1,}','1',yy)  #I probably screwed up the regex here
aa<- as.numeric(strsplit(zz,'')[[1]])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM