简体   繁体   English

从句子中删除单词

[英]Removing the words from the sentences

I have a dataframe containing text, I am trying to remove certain words from the text that are stored in a vector.我有一个包含文本的 dataframe,我试图从存储在向量中的文本中删除某些单词。 Please help me achieve this!请帮助我实现这一目标!

stopwords <- c("today","hot","outside","so","its")
df <- data.frame(a = c("a1", "a2", "a3"), text = c("today the weather looks hot", "its so rainy outside", "today its sunny"))

Expected Output:预期 Output:

   a                        text          new_text
1 a1 Today the weather looks hot the weather looks
2 a2        its so rainy outside             rainy
3 a3             today its sunny             sunny

Paste all the stopwords together and use gsub to remove them.将所有stopwords粘贴在一起并使用gsub删除它们。

df$new_text <- trimws(gsub(paste0(stopwords, collapse = "|"), "", df$text))
df
#   a                        text          new_text
#1 a1 today the weather looks hot the weather looks
#2 a2        its so rainy outside             rainy
#3 a3             today its sunny             sunny

Or with str_remove_all或使用str_remove_all

stringr::str_remove_all(df$text, paste0(stopwords, collapse = "|"))

Just to be extra-safe add word boundaries around each stopwords so that, "so" from "something" or "some" is not replaced.只是为了更加安全,在每个stopwords词周围添加单词边界,以便不会替换"something""some"中的"so"

df$new_text <- trimws(gsub(paste0("\\b", stopwords, "\\b",
               collapse = "|"), "", df$text))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM