Wordcloud2 - 用于计数的单独单词

Question

am trying to extract the words so that I can create a wordcloud but have some difficulties this is the code:我正在尝试提取单词以便我可以创建一个 wordcloud 但有一些困难这是代码：

library(readxl)
data <- read_excel("C:\\Users\\me\\OneDrive\\Desktop\\ToPandas.xlsx")

data2 <-data$articlesDescription


#install.packages("wordcloud2")
#install.packages("tidyverse")
#install.packages("tidytext")

library(wordcloud2)
library(tidyverse)
library(tidytext)

data2 <- gsub('[^[:alnum:] ]', '', data2)

data2 <-  data2 %>% 
  ungroup()

data3.df <- as.data.frame(data2)
data3 <- data3.df

data3 <- data3%>%
  anti_join(get_stopwords())%>%
  unnest_tokens(word, text) %>%
  count(word, sort = TRUE)

I have put the hash tags in front of the install packages so it does not try to reinstall.我已将 hash 标签放在安装包的前面，因此它不会尝试重新安装。 up to data2 until I start to ungroup then I get this error:直到 data2 直到我开始取消分组然后我得到这个错误：

Error in UseMethod("ungroup"): no applicable method for 'ungroup' applied to an object of class "character" UseMethod（“ungroup”）中的错误：没有适用于“ungroup”的方法应用于 class“字符”的 object

then when it tries to move forward I get this:然后当它试图前进时，我得到了这个：

Error in anti_join() : ! anti_join()中的错误：！ by must be supplied when x and y have no common variables.当x和y没有公共变量时，必须提供by 。 i use by = character()` to perform a cross-join.我使用 by = character()` 来执行交叉连接。

I think that my error stems from the first error (ungroup) but I can't figure out how to do it so I can count the words我认为我的错误源于第一个错误（取消组合），但我不知道该怎么做，所以我可以数单词

this is a sample of how the imported xlsx file looks like: ToPandas_xlsx Image这是导入的 xlsx 文件的示例： ToPandas_xlsx Image

Can anyone point me into the right direction?谁能指出我正确的方向？ thanks:)谢谢：）

Answer 1

Maybe this will be enough to get you started:也许这足以让你开始：

test <- data.frame(Text = rep("The quick brown fox jumped over the lazy dog's back.", 5))

Now split out the words:现在拆分单词：

test.lst <- strsplit(test$Text, " ")
test.lst[[1]]
#  [1] "The"    "quick"  "brown"  "fox"    "jumped" "over"   "the"    "lazy"   "dog's"  "back."

Get rid of the punctuation:去掉标点符号：

test.lst2 <- lapply(test.lst, function(x) gsub("[[:punct:]]", "", x))
test.lst2[[1]]
#  [1] "The"    "quick"  "brown"  "fox"    "jumped" "over"   "the"    "lazy"   "dogs"   "back"

test.lst2 is a list containing a part for each row of the data. test.lst2是一个列表，其中包含每行数据的一部分。 If you want to collapse.如果你想崩溃。 To get frequencies:获取频率：

table(unlist(test.lst2))表（未列出（test.lst2））

  back  brown   dogs    fox jumped   lazy   over  quick    the    The 
     5      5      5      5      5      5      5      5      5      5

Wordcloud2 - 用于计数的单独单词

问题描述

1 个解决方案

解决方案1
0 2022-09-24 17:54:19

Wordcloud2 - 用于计数的单独单词

问题描述

1 个解决方案

解决方案1 0 2022-09-24 17:54:19

解决方案1
0 2022-09-24 17:54:19