[英]R. Array-based replacement of string matches in data frame
I have a data frame column containing sentences. 我有一个包含句子的数据框列。 Within these sentences, there's the whole host of words which I want to remove.
在这些句子中,有很多我想删除的单词。
These are words that could appear more than once in a single sentence, and when found I want to remove these words entirely. 这些单词在一个句子中可能出现不止一次,当我发现它们时,我希望将其完全删除。
eg Sample list of words for removal: ("the", "and", "a") * (list will have 100's of words) 例如,要删除的单词的示例列表:(“,”和“,”,“ a”)*(列表中包含100个单词)
String Before: "the quick brown fox jumps over the lazy dog and cat" String After: "quick brown fox jumps over lazy dog cat" 之前的字符串:“快速的棕色狐狸跳过懒惰的狗和猫”之前的字符串:“快速的棕色狐狸跳过懒惰的狗和猫”
sentences <- as.data.frame(c("it's a new sentence","another sentence i've constructed","and a third sentence"))
colnames(sentences) <- c("sentence")
stop_words <- list( "i" = '', "a" = "", "me" = '' , "my" = "", "myself" = "", "we" = "", "it's" = "", "a" = "", "i've" = "")
stop_pattern <- paste0("\\b", "(", paste0(stop_words, collapse = "|"),")","\\b")
trimws(gsub("\\s{2}", " ", gsub(stop_pattern, "", sentences$sentence)))
Output should remove words such as "I've" from the above sentences, however fails to do so. 输出应从上述句子中删除“ I've”之类的词,但不能这样做。
Output is as shows: [1] "it's a new sentence" "another sentence i've constructed" "and a third sentence" 输出如下所示:[1]“这是一个新句子”“我构建的另一个句子”“和第三个句子”
尝试:
stop_pattern <- paste0("\\b", "(", paste0(stop_words, collapse = "|"),")","\\b") trimws(gsub("\\s{2}", " ", gsub(stop_pattern, "", sentences)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.