简体   繁体   English

R使用regexpr,具有多个模式

[英]R using regexpr, with multiple pattern

I would like to find the string just after some patterns. 我想在某些模式之后找到字符串。 My code seem to work but I cannot finish the job. 我的代码似乎可以正常工作,但是我无法完成工作。

Here is an illustration: 这是一个例子:

 pattern <- c("Iligan", "Cabeseria 25|Sta. Lucia", "Capitol", "Osmeña", 
 "Nowhere", "Aglayan")

# I want to match the string just after each pattern. For example I'm going to 
# match City just after Iligan.

  target <-c("Iligan City", "Sta. Lucia, Ozamiz City", " Oroquieta City", 
             "Osmeña St. Dipolog City", "Lucia St., Zamboanga City", 
"Aglayan str, Oroquieta City", "Gingoog City", "Capitol br., Ozamiz City", 
 "Dumaguete City", "Poblacion, Misamis")

#The matches seems to work fine
 (matches <- sapply(pattern,FUN=function(x){regexpr(paste0("
 (?<=\\b",x,"\\b ",")","[\\w-*\\.]*"),target,perl=T)}))
 print (matches)

#But I cannot get the results. I would need use the column of each matrix 
#at a time
 villain <- lapply(matches,FUN = function(x)(regmatches(target,x)))

Do you have a solution to this problem. 您是否有解决此问题的方法。

unpdate 1 未日期1

For the sake of being precise here is the desired output. 为了精确起见,这里是所需的输出。

results <- c("City", "St.", "br.")

#[1] "City" "St."  "br." 

There are some helpers in the stringr package that can simplify the process: stringr程序包中有一些帮助程序可以简化此过程:

pattern <- c("Iligan", "Cabeseria 25|Sta. Lucia", "Capitol", "Osmeña", 
             "Nowhere", "Aglayan")

target <-c("Iligan City", "Sta. Lucia, Ozamiz City", " Oroquieta City", 
           "Osmeña St. Dipolog City", "Lucia St., Zamboanga City", 
           "Aglayan str, Oroquieta City", "Gingoog City", "Capitol br., Ozamiz City", 
           "Dumaguete City", "Poblacion, Misamis")


matchPat <- function(x) {
  unlist(str_extract(target, perl(paste0("(?<=\\b", x, "\\b ",")","[\\w-*\\.]*"))))
}

matches <- sapply(pattern, matchPat)

print(matches)

##       Iligan Cabeseria 25|Sta. Lucia Capitol Osmeña Nowhere Aglayan
##  [1,] "City" NA                      NA      NA     NA      NA     
##  [2,] NA     NA                      NA      NA     NA      NA     
##  [3,] NA     NA                      NA      NA     NA      NA     
##  [4,] NA     NA                      NA      "St."  NA      NA     
##  [5,] NA     NA                      NA      NA     NA      NA     
##  [6,] NA     NA                      NA      NA     NA      "str"  
##  [7,] NA     NA                      NA      NA     NA      NA     
##  [8,] NA     NA                      "br."   NA     NA      NA     
##  [9,] NA     NA                      NA      NA     NA      NA     
## [10,] NA     NA                      NA      NA     NA      NA     

This can be simplified further if you don't need indicators for non-matches, but no sample/expected output was provided. 如果您不需要非匹配指标,但没有提供样本/预期输出,则可以进一步简化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM