R (dplyr) 中接近的 2 個單詞/短語的 Grepl

Question

我正在嘗試為大 dataframe 創建過濾器。我正在嘗試使用 grepl 來搜索特定列中的一系列文本。 我已經為單個單詞/組合完成了此操作，但現在我想搜索兩個非常接近的單詞（即單詞冒號的 3 個單詞內的單詞 tumo(u)r）。

我已經在https://www.regextester.com/109207上檢查了我的正則表達式，我的搜索在那里有效，但它在 R 中不起作用。

我得到的錯誤是 Error: '\W' is an unrecognized escape in character string starting ""\btumor|tumour)\W"

下面的示例 - 嘗試在 cancer 的 3 個詞內搜索 tumo(u)r。

有人可以幫忙嗎？

library(tibble)
example.df <- tibble(number = 1:4, AB = c('tumor of the colon is a very hard disease to cure', 'breast cancer is also known as a neoplasia of the breast', 'tumour of the colon is bad', 'colon cancer is also bad'))

filtered.df <- example.df %>% 
    filter(grepl(("\btumor|tumour)\W|\w+(\w+\W+){0,3}colon\b"), AB, ignore.case=T)

Answer 1

R 使用反斜杠作為轉義符，正則表達式引擎也這樣做。 需要加倍你的反斜杠。 這在 StackOverflow 上的多個先前問題以及在?regex上提出的幫助頁面中都有解釋。 在嘗試復雜操作之前，您應該嘗試在一組更簡單的測試中使用轉義運算符。 並且您應該更加注意模式參數中括號和引號的正確放置。

filtered.df <- example.df %>% 

   #filter(grepl(("\btumor|tumour)\W|\w+(\w+\W+){0,3}colon\b"), AB, 

# errors here ....^.^..............^..^...^..^.............^.^

    filter(grepl( "(\\btumor|tumour)\\W|\\w+(\\w+\\W+){0,3}colon\\b", AB,
ignore.case=T) )

> filtered.df
# A tibble: 2 × 2
  number AB                                               
   <int> <chr>                                            
1      1 tumor of the colon is a very hard disease to cure
2      3 tumour of the colon is bad

R (dplyr) 中接近的 2 個單詞/短語的 Grepl

問題描述

1 個解決方案

解決方案1
0 已采納 2022-12-26 21:42:05

R (dplyr) 中接近的 2 個單詞/短語的 Grepl

問題描述

1 個解決方案

解決方案1 0 已采納 2022-12-26 21:42:05

解決方案1
0 已采納 2022-12-26 21:42:05