简体   繁体   English

如何匹配R中的简单交替模式列表

[英]How to match a list of simple alternating patterns in R

In R, I have data a vector of integers. 在R中,我有一个整数向量的数据。

run <- sample.int(9, 1000, replace=T)
run[sample.int(1000, 100)] <- NA

If at least one of the following patterns, c(1, x, 1, y) or c(x, 1, y, 1) where x and y are either whole numbers or NA, is present, I would like to print out the start index of each pattern and update a count variable for each instance of a pattern. 如果存在以下至少一种模式c(1, x, 1, y)c(x, 1, y, 1)其中xy是整数或NA),我想打印出每个模式的开始索引,并为每个模式实例更新一个计数变量。 What is the most efficient way of doing this? 最有效的方法是什么?

I was thinking of using the rle function and testing for every 4 consecutive values for a length of 1, and then testing whether they conform to one of the patterns. 我当时正在考虑使用rle函数并针对长度为1的每四个连续值进行测试,然后测试它们是否符合模式之一。 However, I am having problems with NAs with this approach since each NA is treated separately. 但是,由于每个NA都被单独处理,因此我在使用NA时遇到问题。 Perhaps there is a better way to do this. 也许有更好的方法可以做到这一点。

Taking your usage of sample.int as implying your vector only contains values from 1:9 and NA , here's a regular expressions approach: sample.int的用法表示您的向量仅包含1:9NA ,这是一种正则表达式方法:

run <- c(1, NA, 1, 3, 1, 1, NA, NA, NA, 1)
run[is.na(run)] <- 0
pat1 <- "(?=1[0-9]1[0-9])" # using a lookahead assertion around the pattern is a way to allow overlapping matches
pat1.idxs <- unlist(gregexpr(pat1, paste(run, collapse=''), perl=TRUE))
pat1.idxs
# match indexes
# [1] 1 3
length(pat1.idxs)
# counts
# [1] 2

Then you would do second pattern similarly. 然后,您将类似地执行第二种模式。

This kind of task could be done with the rollapply function from the zoo package. 这种任务可以通过zoo包中的rollapply函数完成。

set.seed(42)
run <- sample.int(9, 1000, replace=T)
run[sample.int(1000, 100)] <- NA

# a list of the patterns
pattern <- list(c(1, NA, 1, NA), c(NA, 1, NA, 1))

library(zoo)

colSums(rollapply(run, length(pattern[[1]]),
                  function(x) sapply(pattern, identical, x)))

The result is a vector including the counts of the patterns in the pattern list: 结果是一个向量,其中包括pattern列表中的pattern计数:

[1] 0 0

Note . 注意 If the lengths of the patterns were different, rollapply had to be executed multiple times. 如果图案的长度不同, rollapply必须多次执行rollapply

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM