[英]How to match a list of simple alternating patterns in R
In R, I have data a vector of integers. 在R中,我有一个整数向量的数据。
run <- sample.int(9, 1000, replace=T)
run[sample.int(1000, 100)] <- NA
If at least one of the following patterns, c(1, x, 1, y)
or c(x, 1, y, 1)
where x
and y
are either whole numbers or NA, is present, I would like to print out the start index of each pattern and update a count variable for each instance of a pattern. 如果存在以下至少一种模式
c(1, x, 1, y)
或c(x, 1, y, 1)
其中x
和y
是整数或NA),我想打印出每个模式的开始索引,并为每个模式实例更新一个计数变量。 What is the most efficient way of doing this? 最有效的方法是什么?
I was thinking of using the rle
function and testing for every 4 consecutive values for a length of 1, and then testing whether they conform to one of the patterns. 我当时正在考虑使用
rle
函数并针对长度为1的每四个连续值进行测试,然后测试它们是否符合模式之一。 However, I am having problems with NAs with this approach since each NA is treated separately. 但是,由于每个NA都被单独处理,因此我在使用NA时遇到问题。 Perhaps there is a better way to do this.
也许有更好的方法可以做到这一点。
Taking your usage of sample.int
as implying your vector only contains values from 1:9
and NA
, here's a regular expressions approach: 以
sample.int
的用法表示您的向量仅包含1:9
和NA
,这是一种正则表达式方法:
run <- c(1, NA, 1, 3, 1, 1, NA, NA, NA, 1)
run[is.na(run)] <- 0
pat1 <- "(?=1[0-9]1[0-9])" # using a lookahead assertion around the pattern is a way to allow overlapping matches
pat1.idxs <- unlist(gregexpr(pat1, paste(run, collapse=''), perl=TRUE))
pat1.idxs
# match indexes
# [1] 1 3
length(pat1.idxs)
# counts
# [1] 2
Then you would do second pattern similarly. 然后,您将类似地执行第二种模式。
This kind of task could be done with the rollapply
function from the zoo
package. 这种任务可以通过
zoo
包中的rollapply
函数完成。
set.seed(42)
run <- sample.int(9, 1000, replace=T)
run[sample.int(1000, 100)] <- NA
# a list of the patterns
pattern <- list(c(1, NA, 1, NA), c(NA, 1, NA, 1))
library(zoo)
colSums(rollapply(run, length(pattern[[1]]),
function(x) sapply(pattern, identical, x)))
The result is a vector including the counts of the patterns in the pattern
list: 结果是一个向量,其中包括
pattern
列表中的pattern
计数:
[1] 0 0
Note . 注意 。 If the lengths of the patterns were different,
rollapply
had to be executed multiple times. 如果图案的长度不同,
rollapply
必须多次执行rollapply
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.