简体   繁体   中英

Pattern Matching in R

I have a list like

> list(c("a","b","c","d"),c("b","c","e"))
[[1]]
[1] "a" "b" "c" "d"

[[2]]
[1] "b" "c" "e"

I have a sequence "bc". I want to match this pattern with my list and want to know the frequency of this pattern. Required Output: 2 First of all, I need to convert my list into this format c("abcd"),c("bce") so that I can do matching. How to convert and match? Second, how to calculate and store the frequency?

I was using grepl function but it returns logical value, not the count.

Using @Tyler's sample data, you can use gregexpr :

lst <- list(c('a', 'b', 'c', 'd', 'b', 'c'),
            c('b', 'c', 'e'))
lst2 <- lapply(lst, paste, collapse="")
sapply(gregexpr("bc", lst2, fixed = TRUE), length)
# [1] 2 1

Here's one approach using term.count (a non exported function) from the qdap package:

lst <- list(c('a', 'b', 'c', 'd', 'b', 'c'),c('b', 'c', 'e'))
lst2 <- lapply(lst, paste, collapse="") #use lapply to paste the list

## install.packages("qdap")
sapply(lst2, qdap:::term.count, "bc") #count occurences

## > sapply(lst2, qdap:::term.count, "bc")
## bc bc 
##  2  1 

If you don't want to use qdap look at the source for term.count and take what you need.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM