简体   繁体   English

R 中的单词精确匹配

[英]Exact matching in R for a word

I have a dataset named tweets_data which captures details of a tweet using the Rtweet package.我有一个名为 tweets_data 的数据集,它使用 Rtweet package 捕获推文的详细信息。 One of the columns named text captures the text of the tweet.其中一个名为 text 的列捕获了推文的文本。 I am trying to match if the text of the tweet has any of the words mentioned in the code below.我正在尝试匹配推文的文本是否包含以下代码中提到的任何单词。 I was facing issues with the word "ad" since words like dad, adverb, bad etc. were also getting captured.我遇到了“广告”这个词的问题,因为爸爸、副词、坏等词也被捕获了。 I used "\bad\b" instead of "ad" and it improved the results but it is still capturing some tweets where there is no use of word "ad".我使用“\bad\b”而不是“ad”,它改善了结果,但它仍然捕获了一些没有使用“ad”一词的推文。 I want to match for exact word "ad".我想匹配确切的单词“ad”。

words = c("endorsement","advertisement","sponsored","\\bad\\b","sponsored content","advert","paid partnership")
x <- sapply(words, function(x) grepl(tolower(x), tolower(tweets_data$text)))
tweets_data$Words <- apply(x, 1, function(i) paste0(names(i)[i], collapse = ","))
tweets_data$Count <- apply(x, 1, function(i) sum(i))

Use ^ for starting and $ for ending.使用 ^ 开始, $ 结束。

"^ad$", "^AD$"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM