[英]Multiple strings with str_detect R
我想找到多個字符串並將其放入一個變量中,但是我不斷收到錯誤。
queries <- httpdf %>% filter(str_detect(payload, "create" || "drop" || "select"))
Error: invalid 'x' type in 'x || y'
queries <- httpdf %>% filter(str_detect(payload, "create" | "drop" | "select"))
Error: operations are possible only for numeric, logical or complex types
queries1 <- httpdf %>% filter(str_detect(payload, "create", "drop", "select"))
Error: unused arguments ("drop", "select")
這些都不起作用。 是否有另一種方法可以使用str_detect
或者我應該嘗試其他方法? 我希望它們也顯示在同一列中。
在我看來,對於您要查找的非常短的字符串列表,更簡單的方法可以是:
queries <- httpdf %>% filter(str_detect(payload, "create|drop|select"))
因為這實際上是什么
[...]
paste(c("create", "drop", "select"),collapse = '|'))
[...]
確實,正如@penguin 之前所推薦的那樣。
對於要檢測的較長字符串列表,我首先將單個字符串存儲到向量中,然后使用@penguin 的方法,例如:
strings <- c("string1", "string2", "string3", "string4", "string5", "string6")
queries <- httpdf %>%
filter(str_detect(payload, paste(strings, collapse = "|")))
這樣做的好處是,如果您願意或必須,您以后也可以輕松使用矢量strings
。
這是解決此問題的一種方法:
queries1 <- httpdf %>%
filter(str_detect(payload, paste(c("create", "drop", "select"),collapse = '|')))
我建議對此類操作使用循環。 恕我直言,它用途廣泛得多。
一個示例httpdf表(也回答 RxT 的評論):
httpdf <- tibble(
payload = c(
"the createor is nice",
"try to create something to select",
"never catch a dropping knife",
"drop it like it's hot",
NA,
"totaly unrelated" ),
other_optional_columns = 1:6 )
我使用 sapply 循環搜索查詢並將每個字符串作為單獨的模式應用到 str_detect。 這將返回一個矩陣,其中每個搜索查詢字符串一列,每個主題字符串一行,可以折疊以返回您想要的邏輯向量。
queries1 <-
httpdf[
sapply(
c("create", "drop", "select"),
str_detect,
string = httpdf$payload ) %>%
rowSums( na.rm = TRUE ) != 0, ]
當然,它可以包裝在 function 中,以便在 tidyverse 過濾器中使用:
## function
str_detect_mult <-
function( subject, query ) {
sapply(
query,
str_detect,
string = subject ) %>%
rowSums( na.rm = TRUE ) != 0
}
## tidy code
queries1 <- httpdf %>% filter( str_detect_mult( payload, c("create", "drop", "select") ) )
如果您想要精確的單詞匹配(“\\b”匹配單詞邊框並連接到字符串的開頭和結尾),則可以輕松處理單詞寄宿生:
str_detect_mult_exact <-
function( subject, query ) {
sapply(
query,
function(.x)
str_detect(
subject,
str_c("\\b",.x,"\\b") ) ) %>%
rowSums( na.rm = TRUE ) != 0
}
輕松處理多個匹配項(例如,如果您只想要與字符串之一完全匹配的行,即 XOR):
str_detect_mult_xor <-
function( subject, query ) {
sapply(
query,
str_detect,
string = subject ) %>%
rowSums( na.rm = TRUE ) == 1
}
也適用於基礎 R :
## function
str_detect_mult <-
function( subject, query ) {
rowSums(sapply(
query,
grepl,
x = subject ), na.rm = TRUE ) != 0
}
## tidy code
queries1 <- httpdf[ str_detect_mult( httpdf$payload, c("create", "drop", "select") ), ]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.