用tidytext删除停用词

Question

Using tidytext, I have this code: 使用tidytext，我有以下代码：

data(stop_words)
tidy_documents <- tidy_documents %>%
      anti_join(stop_words)

I want it to use the stop words built into the package to write a dataframe called tidy_documents into a dataframe of the same name, but with the words removed if they are in stop_words. 我希望它使用包中内置的停用词将名为tidy_documents的数据帧写入同名的数据帧，但如果它们在stop_words中，则将其删除。

I get this error: 我收到此错误：

Error: No common variables. 错误：没有公共变量。 Please specify by param. 请by参数指定。 Traceback: 追溯：

1. tidy_documents %>% anti_join(stop_words)
2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3. eval(quote(`_fseq`(`_lhs`)), env, env)
4. eval(expr, envir, enclos)
5. `_fseq`(`_lhs`)
6. freduce(value, `_function_list`)
7. withVisible(function_list[[k]](value))
8. function_list[[k]](value)
9. anti_join(., stop_words)
10. anti_join.tbl_df(., stop_words)
11. common_by(by, x, y)
12. stop("No common variables. Please specify `by` param.", call. = FALSE)

Answer 1

You can use the simpler filter() to avoid using the confusing anti_join() function like this: 您可以使用更简单的filter()来避免使用令人困惑的anti_join()函数，如下所示：

tidy_documents <- tidy_documents %>%
  filter(!word %in% stop_words$word)

Answer 2

Both tidy_document and stop_words have a list of words listed under a column named word ; tidy_document和stop_words都在名为word的列下列出了单词列表； however, the columns are inverted: in stop_words , it's the first column, while in your dataset it's the second column. 但是，这些列是相反的：在stop_words ，它是第一列，而在数据集中，它是第二列。 That's why the command is unable to "match" the two columns and compare the words. 这就是为什么该命令无法“匹配”两列并比较单词的原因。 Try this: 尝试这个：

tidy_document <- tidy_document %>% 
      anti_join(stop_words, by = c("word" = "word"))

The by command forces the script to compare the columns that are called word , regardless their position. by命令强制脚本比较称为word的列，而不管它们的位置如何。

用tidytext删除停用词

问题描述

2 个解决方案

解决方案1
6 2017-10-19 00:09:59

解决方案2
4 已采纳 2017-05-14 22:24:58

用tidytext删除停用词

问题描述

2 个解决方案

解决方案1 6 2017-10-19 00:09:59

解决方案2 4 已采纳 2017-05-14 22:24:58

解决方案1
6 2017-10-19 00:09:59

解决方案2
4 已采纳 2017-05-14 22:24:58