R.基于数组的数据帧中字符串匹配的替换

Question

I have a data frame column containing sentences. 我有一个包含句子的数据框列。 Within these sentences, there's the whole host of words which I want to remove. 在这些句子中，有很多我想删除的单词。

These are words that could appear more than once in a single sentence, and when found I want to remove these words entirely. 这些单词在一个句子中可能出现不止一次，当我发现它们时，我希望将其完全删除。

eg Sample list of words for removal: ("the", "and", "a") * (list will have 100's of words) 例如，要删除的单词的示例列表：（“，”和“，”，“ a”）*（列表中包含100个单词）

String Before: "the quick brown fox jumps over the lazy dog and cat" String After: "quick brown fox jumps over lazy dog cat" 之前的字符串：“快速的棕色狐狸跳过懒惰的狗和猫”之前的字符串：“快速的棕色狐狸跳过懒惰的狗和猫”


 sentences <- as.data.frame(c("it's a new sentence","another sentence i've constructed","and a third sentence"))
 colnames(sentences) <- c("sentence")

stop_words <- list( "i" = '', "a" = "", "me" = '' , "my" = "", "myself" = "", "we" = "", "it's" = "", "a" = "", "i've" = "")

 stop_pattern <- paste0("\\b", "(", paste0(stop_words, collapse = "|"),")","\\b")
 trimws(gsub("\\s{2}", " ", gsub(stop_pattern, "", sentences$sentence)))

Output should remove words such as "I've" from the above sentences, however fails to do so. 输出应从上述句子中删除“ I've”之类的词，但不能这样做。

Output is as shows: [1] "it's a new sentence" "another sentence i've constructed" "and a third sentence" 输出如下所示：[1]“这是一个新句子”“我构建的另一个句子”“和第三个句子”

Answer 1

尝试：

stop_pattern <- paste0("\\b", "(", paste0(stop_words, collapse = "|"),")","\\b") trimws(gsub("\\s{2}", " ", gsub(stop_pattern, "", sentences)))

R.基于数组的数据帧中字符串匹配的替换

问题描述

1 个解决方案

解决方案1
0 2019-08-02 12:34:50

R.基于数组的数据帧中字符串匹配的替换

问题描述

1 个解决方案

解决方案1 0 2019-08-02 12:34:50

解决方案1
0 2019-08-02 12:34:50