简体   繁体   中英

R, regexpr,gregexpr, keep track of matches

Let's say I have this data.frame and would like to match the column A with the pattern below. This can be done with regexpr or with gregexpr. Yet I would like to keep track of the rows that were matched as well as the match itself.

df <- data.frame(A=c("where is the pencil? ","the white cat in the kitchen","green hat is over the blue ocean"))

> df
##                                  A
## 1            where is the pencil? 
## 2     the white cat in the kitchen
## 3 green hat is over the blue ocean

pattern <- ("(blue|white|green) \\w*")

regmatches(df[,1],regexpr(pattern,df[,1],perl=TRUE))

> regmatches(df[,1],regexpr(pattern,df[,1],perl=TRUE))
## [1] "white cat" "green hat"

desired output:

##                                  A     match
## 1            where is the pencil?       <NA>
## 2     the white cat in the kitchen white cat
## 3 green hat is over the blue ocean green hat

Change pattern to:

pattern <- paste0(pattern, "|$")

and then replace empty strings with NA . perl=TRUE is not needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM