[英]Wordcloud2 - separate words for counting
am trying to extract the words so that I can create a wordcloud but have some difficulties this is the code:我正在尝试提取单词以便我可以创建一个 wordcloud 但有一些困难这是代码:
library(readxl)
data <- read_excel("C:\\Users\\me\\OneDrive\\Desktop\\ToPandas.xlsx")
data2 <-data$articlesDescription
#install.packages("wordcloud2")
#install.packages("tidyverse")
#install.packages("tidytext")
library(wordcloud2)
library(tidyverse)
library(tidytext)
data2 <- gsub('[^[:alnum:] ]', '', data2)
data2 <- data2 %>%
ungroup()
data3.df <- as.data.frame(data2)
data3 <- data3.df
data3 <- data3%>%
anti_join(get_stopwords())%>%
unnest_tokens(word, text) %>%
count(word, sort = TRUE)
I have put the hash tags in front of the install packages so it does not try to reinstall.我已将 hash 标签放在安装包的前面,因此它不会尝试重新安装。 up to data2 until I start to ungroup then I get this error:直到 data2 直到我开始取消分组然后我得到这个错误:
Error in UseMethod("ungroup"): no applicable method for 'ungroup' applied to an object of class "character" UseMethod(“ungroup”)中的错误:没有适用于“ungroup”的方法应用于 class“字符”的 object
then when it tries to move forward I get this:然后当它试图前进时,我得到了这个:
Error in
anti_join()
: !anti_join()
中的错误:!by
must be supplied whenx
andy
have no common variables.当x
和y
没有公共变量时,必须提供by
。 i use by = character()` to perform a cross-join.我使用 by = character()` 来执行交叉连接。
I think that my error stems from the first error (ungroup) but I can't figure out how to do it so I can count the words我认为我的错误源于第一个错误(取消组合),但我不知道该怎么做,所以我可以数单词
this is a sample of how the imported xlsx file looks like: ToPandas_xlsx Image这是导入的 xlsx 文件的示例: ToPandas_xlsx Image
Can anyone point me into the right direction?谁能指出我正确的方向? thanks:)谢谢:)
Maybe this will be enough to get you started:也许这足以让你开始:
test <- data.frame(Text = rep("The quick brown fox jumped over the lazy dog's back.", 5))
Now split out the words:现在拆分单词:
test.lst <- strsplit(test$Text, " ")
test.lst[[1]]
# [1] "The" "quick" "brown" "fox" "jumped" "over" "the" "lazy" "dog's" "back."
Get rid of the punctuation:去掉标点符号:
test.lst2 <- lapply(test.lst, function(x) gsub("[[:punct:]]", "", x))
test.lst2[[1]]
# [1] "The" "quick" "brown" "fox" "jumped" "over" "the" "lazy" "dogs" "back"
test.lst2
is a list containing a part for each row of the data. test.lst2
是一个列表,其中包含每行数据的一部分。 If you want to collapse.如果你想崩溃。 To get frequencies:获取频率:
table(unlist(test.lst2))表(未列出(test.lst2))
back brown dogs fox jumped lazy over quick the The
5 5 5 5 5 5 5 5 5 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.