簡體   English   中英

使用向量中的元素的R grep正則表達式(FOLLOW UP)

[英]R grep regular expression using elements in a vector (FOLLOW UP)

此問題之后 ,我還有另一個示例,其中我無法使用已接受的答案。

同樣,我想在lab向量中找到每個確切的group元素...

labs <- c("Beijing -- T0 -- BC-89 + CN",
"Beijing -- T24 -- BC-89 + CN",
"Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Beijing -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Beijing -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN",
"Beijing -- T24 -- BC-89 with 2% Puricare + 5% Merquat + CN",
"Beijing -- T0 -- BC-89 + CN",
"Zhangjiakou -- T0 -- BC-89 + CN",
"Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Zhangjiakou -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Beijing -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN",
"Zhangjiakou -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN",
"Beijing -- T0 -- BC-89 + CN",
"Beijing -- T0 -- BC-89 + CN",
"Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Beijing -- T24 -- BC-89 + CN",
"Beijing -- T24 -- BC-89 + CN",
"Beijing -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Zhangjiakou -- T0 -- BC-89 + CN",
"Zhangjiakou -- T0 -- BC-89 + CN",
"Zhangjiakou -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Zhangjiakou -- T24 -- BC-89 + CN",
"Zhangjiakou -- T24 -- BC-89 + CN",
"Zhangjiakou -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC")
labs
groups <- c("BC-89 + CN", "BC-89 + CN with 2% DD + 1.6% ZC", "BC-89 with 2% Puricare + 5% Merquat + CN")
groups

我嘗試以下操作,但不起作用...

grep(paste0(groups[1], "$"), labs, value=TRUE)
grep(paste0(groups[2], "$"), labs, value=TRUE)
grep(paste0(groups[3], "$"), labs, value=TRUE)

有什么幫助嗎?

嘗試

lapply(groups, function(g)
  grep(gsub("\\+", "\\\\+", paste0(g, "$")), labs, value = TRUE))
# [[1]]
# [1] "Beijing -- T0 -- BC-89 + CN"     
# [2] "Beijing -- T24 -- BC-89 + CN"    
# [3] "Beijing -- T0 -- BC-89 + CN"     
# [4] "Zhangjiakou -- T0 -- BC-89 + CN" 
# [5] "Beijing -- T0 -- BC-89 + CN"     
# [6] "Beijing -- T0 -- BC-89 + CN"     
# [7] "Beijing -- T24 -- BC-89 + CN"    
# [8] "Beijing -- T24 -- BC-89 + CN"    
# [9] "Zhangjiakou -- T0 -- BC-89 + CN" 
# [10] "Zhangjiakou -- T0 -- BC-89 + CN" 
# [11] "Zhangjiakou -- T24 -- BC-89 + CN"
# [12] "Zhangjiakou -- T24 -- BC-89 + CN"
# 
# [[2]]
# [1] "Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC"     
# [2] "Beijing -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC"    
# [3] "Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC"     
# [4] "Zhangjiakou -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC" 
# [5] "Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC"     
# [6] "Beijing -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC"    
# [7] "Zhangjiakou -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC" 
# [8] "Zhangjiakou -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC"
# 
# [[3]]
# [1] "Beijing -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN"    
# [2] "Beijing -- T24 -- BC-89 with 2% Puricare + 5% Merquat + CN"   
# [3] "Beijing -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN"    
# [4] "Zhangjiakou -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN"

您的方法的問題在於,例如groups[1]"BC-89 + CN" ,其中包含+ ,在正則表達式中具有特殊含義。 僅鑒於此,在grep添加fixed = TRUE將解決此問題,但是$將失去作用。 所以我要做的是首先在組名中轉義+

另外,關於您的鏈接答案,您可以

lapply(groups, function(g)
  grep(paste0(g, "$"), paste0(labs, "$"), value = TRUE, fixed = TRUE))

從stringr包中嘗試。 “ coll”選項實現“人類可讀的排序規則”,它可以幫助您匹配看起來相同的內容,但是由於某種原因,R首先拒絕匹配它們:

> library(stringr)
> str_detect(labs,coll(groups))
 [1]  TRUE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE  
TRUE FALSE FALSE
[16]  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE

+是正則表達式中的特殊字符。 您將需要“ \\ +”轉義特殊字符。

new_group <- gsub("\\+",replacement = "\\\\+",x =groups)

另外,“ |” 在grep中的作用類似於“或”。

new_group1 <- paste0(new_group,collapse = "|")

grep(pattern = new_group1,x = labs,value = T)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM