[英]Struggling with removing words based on pattern (text analysis in R)
I'm new to text analysis.我是文本分析的新手。 I have been struggling with a particular problem in R this past week.
上周我一直在努力解决 R 中的一个特定问题。 I am trying to figure out how to remove or replace all variations of a word in a string.
我想弄清楚如何删除或替换字符串中单词的所有变体。 For example, if the string is:
例如,如果字符串是:
test <- c("development", "develop", "developing", "developer", "apples", "kiwi")
I want the end output to be:我希望最终输出是:
"apples", "kiwi"
So, basically, I'm trying to figure out how to remove or replace all words beginning with "^develop".所以,基本上,我试图弄清楚如何删除或替换所有以“^develop”开头的单词。 I have tried using str_remove_all in the stringr package using this expression:
我曾尝试使用以下表达式在 stringr 包中使用 str_remove_all :
str_remove_all(test, "^dev")
But the end result was this:但最终的结果是这样的:
"elopment", "elop", "eloping", "eloper", "apples", "kiwi"
It only removed parts of the word that matched the beginning expression "dev", whereas I want to remove the entire word if it matches the beginning of "dev".它只删除了与开头表达式“dev”匹配的部分单词,而如果它与“dev”的开头匹配,我想删除整个单词。
Thanks!谢谢!
过滤器(函数(x)!any(grepl(“开发”,x)),测试)
Use grep with invert:将 grep 与反转一起使用:
grep("^develop", test, invert = TRUE, value = TRUE)
## [1] "apples" "kiwi"
or negate grepl:或否定 grepl:
ok <- !grepl("^develop", test)
test[ok]
or remove develop
and then retrieve those elements that have not changed:或者删除
develop
然后检索那些没有改变的元素:
test[sub("^develop", "", test) == test]
通过stringr
,您可以执行以下操作:
stringr::str_subset(test, "^dev", negate = TRUE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.