[英]Removing the words from the sentences
I have a dataframe containing text, I am trying to remove certain words from the text that are stored in a vector.我有一个包含文本的 dataframe,我试图从存储在向量中的文本中删除某些单词。 Please help me achieve this!
请帮助我实现这一目标!
stopwords <- c("today","hot","outside","so","its")
df <- data.frame(a = c("a1", "a2", "a3"), text = c("today the weather looks hot", "its so rainy outside", "today its sunny"))
Expected Output:预期 Output:
a text new_text
1 a1 Today the weather looks hot the weather looks
2 a2 its so rainy outside rainy
3 a3 today its sunny sunny
Paste all the stopwords
together and use gsub
to remove them.将所有
stopwords
粘贴在一起并使用gsub
删除它们。
df$new_text <- trimws(gsub(paste0(stopwords, collapse = "|"), "", df$text))
df
# a text new_text
#1 a1 today the weather looks hot the weather looks
#2 a2 its so rainy outside rainy
#3 a3 today its sunny sunny
Or with str_remove_all
或使用
str_remove_all
stringr::str_remove_all(df$text, paste0(stopwords, collapse = "|"))
Just to be extra-safe add word boundaries around each stopwords
so that, "so"
from "something"
or "some"
is not replaced.只是为了更加安全,在每个
stopwords
词周围添加单词边界,以便不会替换"something"
或"some"
中的"so"
。
df$new_text <- trimws(gsub(paste0("\\b", stopwords, "\\b",
collapse = "|"), "", df$text))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.