使用 str_detect R 的多個字符串

Question

我想找到多個字符串並將其放入一個變量中，但是我不斷收到錯誤。

queries <- httpdf %>% filter(str_detect(payload, "create" || "drop" || "select"))
Error: invalid 'x' type in 'x || y'

queries <- httpdf %>% filter(str_detect(payload, "create" | "drop" | "select"))
Error: operations are possible only for numeric, logical or complex types

queries1 <- httpdf %>% filter(str_detect(payload, "create", "drop", "select"))
Error: unused arguments ("drop", "select")

這些都不起作用。 是否有另一種方法可以使用str_detect或者我應該嘗試其他方法？ 我希望它們也顯示在同一列中。

Answer 1

在我看來，對於您要查找的非常短的字符串列表，更簡單的方法可以是：

queries <- httpdf %>% filter(str_detect(payload, "create|drop|select"))

因為這實際上是什么

[...] paste(c("create", "drop", "select"),collapse = '|')) [...]

確實，正如@penguin 之前所推薦的那樣。

對於要檢測的較長字符串列表，我首先將單個字符串存儲到向量中，然后使用@penguin 的方法，例如：

strings <- c("string1", "string2", "string3", "string4", "string5", "string6")
queries <- httpdf %>% 
  filter(str_detect(payload, paste(strings, collapse = "|")))

這樣做的好處是，如果您願意或必須，您以后也可以輕松使用矢量strings 。

Answer 2

這是解決此問題的一種方法：

queries1 <- httpdf %>% 
  filter(str_detect(payload, paste(c("create", "drop", "select"),collapse = '|')))

Answer 3

我建議對此類操作使用循環。 恕我直言，它用途廣泛得多。

一個示例httpdf表（也回答 RxT 的評論）：

httpdf <- tibble(
  payload = c(
    "the createor is nice",
    "try to create something to select",
    "never catch a dropping knife",
    "drop it like it's hot",
    NA,
    "totaly unrelated" ),
  other_optional_columns = 1:6 )

我使用 sapply 循環搜索查詢並將每個字符串作為單獨的模式應用到 str_detect。 這將返回一個矩陣，其中每個搜索查詢字符串一列，每個主題字符串一行，可以折疊以返回您想要的邏輯向量。

queries1 <-
  httpdf[ 
    sapply(
      c("create", "drop", "select"),
      str_detect,
      string = httpdf$payload ) %>%
    rowSums( na.rm = TRUE ) != 0, ]

當然，它可以包裝在 function 中，以便在 tidyverse 過濾器中使用：

## function
str_detect_mult <-
  function( subject, query ) {
    sapply(
      query,
      str_detect,
      string = subject ) %>%
    rowSums( na.rm = TRUE ) != 0
}
## tidy code
queries1 <- httpdf %>% filter( str_detect_mult( payload, c("create", "drop", "select") ) )

如果您想要精確的單詞匹配（“\\b”匹配單詞邊框並連接到字符串的開頭和結尾），則可以輕松處理單詞寄宿生：

str_detect_mult_exact <-
  function( subject, query ) {
    sapply(
      query,
      function(.x)
        str_detect(
          subject,
          str_c("\\b",.x,"\\b") ) ) %>%
    rowSums( na.rm = TRUE ) != 0
}

輕松處理多個匹配項（例如，如果您只想要與字符串之一完全匹配的行，即 XOR）：

str_detect_mult_xor <-
  function( subject, query ) {
    sapply(
      query,
      str_detect,
      string = subject ) %>%
    rowSums( na.rm = TRUE ) == 1
}

也適用於基礎 R ：

## function
str_detect_mult <-
  function( subject, query ) {
    rowSums(sapply(
      query,
      grepl,
      x = subject ), na.rm = TRUE ) != 0
}
## tidy code
queries1 <- httpdf[ str_detect_mult( httpdf$payload, c("create", "drop", "select") ), ]

使用 str_detect R 的多個字符串

問題描述

3 個解決方案

解決方案1
64 2018-05-03 20:42:56

解決方案2
38 2017-01-19 10:18:33

解決方案3
0 2022-09-15 12:26:39

使用 str_detect R 的多個字符串

問題描述

3 個解決方案

解決方案1 64 2018-05-03 20:42:56

解決方案2 38 2017-01-19 10:18:33

解決方案3 0 2022-09-15 12:26:39

解決方案1
64 2018-05-03 20:42:56

解決方案2
38 2017-01-19 10:18:33

解決方案3
0 2022-09-15 12:26:39