简体   繁体   English

找到包含在两个`n2` FALSE之间的`n1` TRUE,整个事物包含在`n3` TRUE之间,等等

[英]Finding `n1` TRUEs wrapped in between two `n2` FALSEs, the whole thing wrapped in between `n3` TRUEs, etc

From a sequence of TRUEs and falses, I wanted to make a function that returns TRUE whether there is a series of at least n1 TRUEs somewhere in the sequence. 从一系列TRUE和falses,我想创建一个返回TRUE的函数,无论序列中某处是否有一系列至少n1 TRUE。 Here is that function: 这是这个功能:

fun_1 = function(TFvec, n1){
    nbT = 0
    solution = -1
    for (i in 1:length(x)){
            if (x[i]){
            nbT = nbT + 1
               if (nbT == n1){
                return(T)
                break
               }
            } else {
                nbT = 0
            }
        }
        return (F) 
}

Test: 测试:

x = c(T,F,T,T,F,F,T,T,T,F,F,T,F,F)
fun_1(x,3) # TRUE
fun_1(x,4) # FALSE

Then, I needed a function that returns TRUE if in a given list boolean vector, there is a series of at least n1 TRUEs wrapped by at least two series (one on each side) of n2 falses. 然后,我需要一个返回TRUE的函数,如果在给定的列表布尔向量中,有一系列至少n1 TRUE被n2 falses的至少两个系列(每侧一个)包裹。 Here the function: 这里的功能:

fun_2 = function(TFvec, n1, n2){
    if (n2 == 0){
        fun_1(TFvec, n2)        
    }
    nbFB = 0
    nbFA = 0
    nbT = 0
    solution = -1
    last = F
    for (i in 1:length(TFvec)){
        if(TFvec[i]){           
            nbT = nbT + 1
            if (nbT == n1 & nbFB >= n2){
                solution = i-n1+1
            }
            last = T
        } else {
            if (last){
                nbFB = 0
                nbFA = 0        
            }
            nbFB = nbFB + 1
            nbFA = nbFA + 1
            nbT = 0
            if (nbFA == n2 & solution!=-1){
                return(T)
            }
            last = F
        }
    }
    return(F)
}

It is maybe not a very efficient function though! 虽然这可能不是一个非常有效的功能! And I haven't tested it 100 times but it looks like it works fine! 我没有测试过100次,但看起来它工作正常!

Test: 测试:

x = c(T,F,T,T,F,F,T,T,T,F,F,T,F,F)
fun_2(x, 3, 2) # TRUE
fun_2(x, 3, 3) # FALSE

Now, believe it or not, I'd like to make a function ( fun_3 ) that returns TRUE if in the boolean vector there is a (at least) series of at least n1 TRUEs wrapped in between (at least) two (one on each side) series of n2 falses where the whole thing (the three series) are wrapped in between (at least) two (one on each side) series of n3 TRUEs. 现在,不管你信不信,我想创建一个返回TRUE的函数( fun_3 ),如果在布尔向量中有一个(至少)系列,至少有n1 TRUE包含在(至少)两个之间(一个在每一方面) n2系列n2 falses,其中整个事物(三个系列)被包裹在(至少)两个(每侧一个)系列n3 TRUE之间。 And as I am afraid to have to bring this problem even further, I am asking here for help to create a function fun_n in which we enter two arguments TFvec and list_n where list_n is a list of n of any length. 由于我不得不进一步提出这个问题,我在这里请求帮助创建一个函数fun_n ,我在其中输入两个参数TFveclist_n ,其中list_n是任意长度的n列表。

Can you help me to create the function fun_n ? 你能帮我创建fun_n函数吗?

For convenience, record the length of the number of thresholds 为方便起见,记录阈值数量的长度

n = length(list_n)

Represent the vector of TRUE and FALSE as a run-length encoding, remembering the length of each run for convenience 将TRUE和FALSE的向量表示为行程编码,为方便起见记住每次运行的长度

r = rle(TFvec); l = r$length

Find possible starting locations 查找可能的起始位置

idx = which(l >= list_n[1] & r$value)

Make sure the starting locations are embedded enough to satisfy all tests 确保起始位置足够嵌入以满足所有测试

idx = idx[idx > n - 1 & idx + n - 1 <= length(l)]

Then check that lengths of successively remote runs are consistent with the condition, keeping only those starting points that are 然后检查连续远程运行的长度是否与条件一致,仅保留那些起始点

for (i in seq_len(n - 1)) {
    if (length(idx) == 0)
        break     # no solution
    thresh = list_n[i + 1]
    test = (l[idx + i] >= thresh) & (l[idx - i] >= thresh)
    idx = idx[test]
}

If there are any values left in idx , then these are the indexes into the rle satisfying the condition; 如果idx还有任何值,则这些是满足条件的rle的索引; the starting point(s) in the initial vector are cumsum(l)[idx - 1] + 1 . 初始向量中的cumsum(l)[idx - 1] + 1cumsum(l)[idx - 1] + 1

Combined: 联合:

runfun = function(TFvec, list_n) {
    ## setup
    n = length(list_n)
    r = rle(TFvec); l = r$length

    ## initial condition
    idx = which(l >= list_n[1] & r$value)
    idx = idx[idx > n - 1 & idx + n - 1 <= length(l)]

    ## adjacent conditions
    for (i in seq_len(n - 1)) {
        if (length(idx) == 0)
            break     # no solution
        thresh = list_n[i + 1]
        test = (l[idx + i] >= thresh) & (l[idx - i] >= thresh)
        idx = idx[test]
    }

    ## starts = cumsum(l)[idx - 1] + 1
    ## any luck?
    length(idx) != 0
}

This is fast and allows for runs >= the threshold, as stipulated in the question; 这很快,并允许运行> =阈值,如问题中所规定的; for example 例如

x = sample(c(TRUE, FALSE), 1000000, TRUE)
system.time(runfun(x, rep(2, 5)))

completes in less than 1/5th of a second. 在不到1/5秒内完成。

A fun generalization allows for flexible condition, eg, runs of exactly list_n , as in the rollapply solution 有趣的概括允许灵活的条件,例如,精确list_n运行,如rollapply解决方案

runfun = function(TFvec, list_n, cond=`>=`) {
    ## setup
    n = length(list_n)
    r = rle(TFvec); l = r$length

    ## initial condition
    idx = which(cond(l, list_n[1]) & r$value)
    idx = idx[idx > n - 1 & idx + n - 1 <= length(l)]

    ## adjacent conditions
    for (i in seq_len(n - 1)) {
        if (length(idx) == 0)
            break     # no solution
        thresh = list_n[i + 1]
        test = cond(l[idx + i], thresh) & cond(l[idx - i], thresh)
        idx = idx[test]
    }

    ## starts = cumsum(l)[idx - 1] + 1
    ## any luck?
    length(idx) != 0
}

Create a template, tpl of zeros and ones, convert it to a regex pattern pat . 创建一个模板,0和0的tpl ,将其转换为正则表达式模式pat Convert x to a single string of zeros and ones and use grepl to match pat to it. x转换为单个0和1的字符串,并使用greplpat匹配到它。 No packages are used. 没有使用包裹。

fun_n <- function(x, lens) {
  n <- length(lens)
  reps <- c(rev(lens), lens[-1])
  TF <- if (n == 1) 1 else if (n %% 2) 1:0 else 0:1
  tpl <- paste0(rep(TF, length = n), "{", reps, ",}")
  pat <- paste(tpl, collapse = "")
  grepl(pat, paste(x + 0, collapse = ""))
}

# test
x <- c(F, T, T, F, F, T, T, T, F, F, T, T, T, F)
fun_n(x, 3:1)
## TRUE
fun_n(x, 1:3)
## FALSE
fun_n(x, 100)
## FALSE
fun_n(x, 3)
## TRUE
fun_n(c(F, T, F), c(1, 1))
## [1] TRUE
fun_n(c(F, T, T, F), c(1, 1)) 
## [1] TRUE

Run time is not as fast as runfun on the example below but still quite fast running 10,000 instances of the example shown in slightly over 2 seconds on my laptop. 在下面的示例中,运行时间没有runfun那么快,但在我的笔记本电脑上运行时间仍然非常快,运行时间略超过2秒。 Also the code is relatively short in length and loop-free. 此外,代码的长度相对较短且无环路。

> library(rbenchmark)
> benchmark(runfun(x, 1:3), fun_n(x, 1:3), replications = 10000)[1:4]

            test replications elapsed relative
2  fun_n(x, 1:3)        10000    2.29    1.205
1 runfun(x, 1:3)        10000    1.90    1.000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM