简体   繁体   English

Wordcloud2 - 用于计数的单独单词

[英]Wordcloud2 - separate words for counting

am trying to extract the words so that I can create a wordcloud but have some difficulties this is the code:我正在尝试提取单词以便我可以创建一个 wordcloud 但有一些困难这是代码:

library(readxl)
data <- read_excel("C:\\Users\\me\\OneDrive\\Desktop\\ToPandas.xlsx")

data2 <-data$articlesDescription


#install.packages("wordcloud2")
#install.packages("tidyverse")
#install.packages("tidytext")

library(wordcloud2)
library(tidyverse)
library(tidytext)

data2 <- gsub('[^[:alnum:] ]', '', data2)

data2 <-  data2 %>% 
  ungroup()

data3.df <- as.data.frame(data2)
data3 <- data3.df

data3 <- data3%>%
  anti_join(get_stopwords())%>%
  unnest_tokens(word, text) %>%
  count(word, sort = TRUE)

I have put the hash tags in front of the install packages so it does not try to reinstall.我已将 hash 标签放在安装包的前面,因此它不会尝试重新安装。 up to data2 until I start to ungroup then I get this error:直到 data2 直到我开始取消分组然后我得到这个错误:

Error in UseMethod("ungroup"): no applicable method for 'ungroup' applied to an object of class "character" UseMethod(“ungroup”)中的错误:没有适用于“ungroup”的方法应用于 class“字符”的 object

then when it tries to move forward I get this:然后当它试图前进时,我得到了这个:

Error in anti_join() : ! anti_join()中的错误:! by must be supplied when x and y have no common variables.xy没有公共变量时,必须提供by i use by = character()` to perform a cross-join.我使用 by = character()` 来执行交叉连接。

I think that my error stems from the first error (ungroup) but I can't figure out how to do it so I can count the words我认为我的错误源于第一个错误(取消组合),但我不知道该怎么做,所以我可以数单词

this is a sample of how the imported xlsx file looks like: ToPandas_xlsx Image这是导入的 xlsx 文件的示例: ToPandas_xlsx Image

Can anyone point me into the right direction?谁能指出我正确的方向? thanks:)谢谢:)

Maybe this will be enough to get you started:也许这足以让你开始:

test <- data.frame(Text = rep("The quick brown fox jumped over the lazy dog's back.", 5))

Now split out the words:现在拆分单词:

test.lst <- strsplit(test$Text, " ")
test.lst[[1]]
#  [1] "The"    "quick"  "brown"  "fox"    "jumped" "over"   "the"    "lazy"   "dog's"  "back." 

Get rid of the punctuation:去掉标点符号:

test.lst2 <- lapply(test.lst, function(x) gsub("[[:punct:]]", "", x))
test.lst2[[1]]
#  [1] "The"    "quick"  "brown"  "fox"    "jumped" "over"   "the"    "lazy"   "dogs"   "back"  

test.lst2 is a list containing a part for each row of the data. test.lst2是一个列表,其中包含每行数据的一部分。 If you want to collapse.如果你想崩溃。 To get frequencies:获取频率:

table(unlist(test.lst2))表(未列出(test.lst2))

  back  brown   dogs    fox jumped   lazy   over  quick    the    The 
     5      5      5      5      5      5      5      5      5      5 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM