[英]Counting the number of stop words in a text
I was wondering if anyone could help me with the following problem: I am trying to determine the number (count) of stop words in customer review texts. 我想知道是否有人可以帮助我解决以下问题:我试图确定客户评论文本中停用词的数量(计数)。 I am using the "quanteda" package stop words list in R. I have tokenised the text and filtered out the stop words by using the following code: 我正在R中使用“ quanteda”包停用词列表。我已经标记了文本,并使用以下代码过滤了停用词:
stop.words <- tokens_select(corpus2.tokens, stopwords())
However, I am now having trouble saving these results in such a way that I can count the actual number of stopwords included in each review. 但是,我现在很难以这种方式保存这些结果,以至于我无法计算每次评论中包含的停用词的实际数量。
Any tipps would be greatly appreciated. 任何小费将不胜感激。 Thanks in advance! 提前致谢!
You can use str_detect
from stringr
(or stri_detect
from stringi
) to count the number of stopwords. 您可以使用str_detect
从stringr
(或stri_detect
从stringi
)计数停止字的数量。 str_detect will return TRUE
or FALSE
and these you can just count. str_detect将返回TRUE
或FALSE
,您可以进行计数。 Depending on which stopword list you have you can get different results. 根据您拥有的停用词列表,您可以获得不同的结果。 stopwords("en")
from stopwords
package will return 28. If you use stopwords(source = "smart")
you will get a count of 61. stopwords("en")
从stopwords
如果用包将返回28 stopwords(source = "smart")
你会得到61的计数。
text <- "I've never had a better pulled pork pizza! The amount of toppings that they layered on it was astounding...bacon, corn, more pulled pork, and the sauce was delicious. I shared my pizza with 2 other people. I can't wait to go back."
stopwords <- stopwords::stopwords("en")
sum(stringr::str_detect(tolower(text), stopwords))
28
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.