[英]R grep regular expression using elements in a vector (FOLLOW UP)
在此問題之后 ,我還有另一個示例,其中我無法使用已接受的答案。
同樣,我想在lab
向量中找到每個確切的group
元素...
labs <- c("Beijing -- T0 -- BC-89 + CN",
"Beijing -- T24 -- BC-89 + CN",
"Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Beijing -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Beijing -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN",
"Beijing -- T24 -- BC-89 with 2% Puricare + 5% Merquat + CN",
"Beijing -- T0 -- BC-89 + CN",
"Zhangjiakou -- T0 -- BC-89 + CN",
"Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Zhangjiakou -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Beijing -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN",
"Zhangjiakou -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN",
"Beijing -- T0 -- BC-89 + CN",
"Beijing -- T0 -- BC-89 + CN",
"Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Beijing -- T24 -- BC-89 + CN",
"Beijing -- T24 -- BC-89 + CN",
"Beijing -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Zhangjiakou -- T0 -- BC-89 + CN",
"Zhangjiakou -- T0 -- BC-89 + CN",
"Zhangjiakou -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC",
"Zhangjiakou -- T24 -- BC-89 + CN",
"Zhangjiakou -- T24 -- BC-89 + CN",
"Zhangjiakou -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC")
labs
groups <- c("BC-89 + CN", "BC-89 + CN with 2% DD + 1.6% ZC", "BC-89 with 2% Puricare + 5% Merquat + CN")
groups
我嘗試以下操作,但不起作用...
grep(paste0(groups[1], "$"), labs, value=TRUE)
grep(paste0(groups[2], "$"), labs, value=TRUE)
grep(paste0(groups[3], "$"), labs, value=TRUE)
有什么幫助嗎?
嘗試
lapply(groups, function(g)
grep(gsub("\\+", "\\\\+", paste0(g, "$")), labs, value = TRUE))
# [[1]]
# [1] "Beijing -- T0 -- BC-89 + CN"
# [2] "Beijing -- T24 -- BC-89 + CN"
# [3] "Beijing -- T0 -- BC-89 + CN"
# [4] "Zhangjiakou -- T0 -- BC-89 + CN"
# [5] "Beijing -- T0 -- BC-89 + CN"
# [6] "Beijing -- T0 -- BC-89 + CN"
# [7] "Beijing -- T24 -- BC-89 + CN"
# [8] "Beijing -- T24 -- BC-89 + CN"
# [9] "Zhangjiakou -- T0 -- BC-89 + CN"
# [10] "Zhangjiakou -- T0 -- BC-89 + CN"
# [11] "Zhangjiakou -- T24 -- BC-89 + CN"
# [12] "Zhangjiakou -- T24 -- BC-89 + CN"
#
# [[2]]
# [1] "Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC"
# [2] "Beijing -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC"
# [3] "Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC"
# [4] "Zhangjiakou -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC"
# [5] "Beijing -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC"
# [6] "Beijing -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC"
# [7] "Zhangjiakou -- T0 -- BC-89 + CN with 2% DD + 1.6% ZC"
# [8] "Zhangjiakou -- T24 -- BC-89 + CN with 2% DD + 1.6% ZC"
#
# [[3]]
# [1] "Beijing -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN"
# [2] "Beijing -- T24 -- BC-89 with 2% Puricare + 5% Merquat + CN"
# [3] "Beijing -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN"
# [4] "Zhangjiakou -- T0 -- BC-89 with 2% Puricare + 5% Merquat + CN"
您的方法的問題在於,例如groups[1]
是"BC-89 + CN"
,其中包含+
,在正則表達式中具有特殊含義。 僅鑒於此,在grep
添加fixed = TRUE
將解決此問題,但是$
將失去作用。 所以我要做的是首先在組名中轉義+
。
另外,關於您的鏈接答案,您可以
lapply(groups, function(g)
grep(paste0(g, "$"), paste0(labs, "$"), value = TRUE, fixed = TRUE))
從stringr包中嘗試。 “ coll”選項實現“人類可讀的排序規則”,它可以幫助您匹配看起來相同的內容,但是由於某種原因,R首先拒絕匹配它們:
> library(stringr)
> str_detect(labs,coll(groups))
[1] TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE FALSE TRUE
TRUE FALSE FALSE
[16] TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE
+
是正則表達式中的特殊字符。 您將需要“ \\ +”轉義特殊字符。
new_group <- gsub("\\+",replacement = "\\\\+",x =groups)
另外,“ |” 在grep中的作用類似於“或”。
new_group1 <- paste0(new_group,collapse = "|")
grep(pattern = new_group1,x = labs,value = T)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.