计算文本中停用词的数量

Question

I was wondering if anyone could help me with the following problem: I am trying to determine the number (count) of stop words in customer review texts. 我想知道是否有人可以帮助我解决以下问题：我试图确定客户评论文本中停用词的数量（计数）。 I am using the "quanteda" package stop words list in R. I have tokenised the text and filtered out the stop words by using the following code: 我正在R中使用“ quanteda”包停用词列表。我已经标记了文本，并使用以下代码过滤了停用词：

stop.words <- tokens_select(corpus2.tokens, stopwords())

However, I am now having trouble saving these results in such a way that I can count the actual number of stopwords included in each review. 但是，我现在很难以这种方式保存这些结果，以至于我无法计算每次评论中包含的停用词的实际数量。

Any tipps would be greatly appreciated. 任何小费将不胜感激。 Thanks in advance! 提前致谢！

Answer 1

You can use str_detect from stringr (or stri_detect from stringi ) to count the number of stopwords. 您可以使用str_detect从stringr （或stri_detect从stringi ）计数停止字的数量。 str_detect will return TRUE or FALSE and these you can just count. str_detect将返回TRUE或FALSE ，您可以进行计数。 Depending on which stopword list you have you can get different results. 根据您拥有的停用词列表，您可以获得不同的结果。 stopwords("en") from stopwords package will return 28. If you use stopwords(source = "smart") you will get a count of 61. stopwords("en")从stopwords如果用包将返回28 stopwords(source = "smart")你会得到61的计数。

text <- "I've never had a better pulled pork pizza! The amount of toppings that they layered on it was astounding...bacon, corn, more pulled pork, and the sauce was delicious. I shared my pizza with 2 other people. I can't wait to go back."
stopwords <- stopwords::stopwords("en")

sum(stringr::str_detect(tolower(text), stopwords))
28

计算文本中停用词的数量

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-04-21 12:46:48

计算文本中停用词的数量

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-04-21 12:46:48

解决方案1
0 已采纳 2018-04-21 12:46:48