[英]how to use boundary with str_detect (tidyr package)
這是一些數據。
library(stringr)
library(dplyr)
df <- tibble(sentences)
我想用“她”這個詞來識別所有的句子。 但是,當然,這也會返回帶有“那里”和“這里”之類的詞的句子。
df %>% filter(str_detect(sentences, "her"))
# A tibble: 43 x 1
sentences
<chr>
1 The boy was there when the sun rose.
2 Help the woman get back to her feet.
3 What joy there is in living.
4 There are more than two factors here.
5 Cats and dogs each hate the other.
6 The wharf could be seen at the farther shore.
7 The tiny girl took off her hat.
8 Write a fond note to the friend you cherish.
9 There was a sound of dry leaves outside.
10 Add the column and put the sum here.
stringr::str_detect
的文檔說,“將字符、單詞、行和句子邊界與boundary()
匹配。” 我無法弄清楚如何做到這一點,也無法在任何地方找到示例。 所有文檔示例都涉及str_split
或str_count
函數。
我的問題與此問題有關,但我特別想了解如何使用stringr::boundary
函數。
我們可以在開始和結束時指定單詞邊界( \\\\b
)以避免任何部分匹配
library(stringr)
library(dplyr)
df %>%
filter(str_detect(sentences, "\\bher\\b"))
# sentences
#1 Help the woman get back to her feet.
#2 The tiny girl took off her hat.
或者使用boundary
來包裹
df %>%
filter(str_detect(sentences, boundary("her")))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.