简体   繁体   English

如何使用 R 计算每列特定单词的频率?

[英]How to Count the frequency of specific words for each column using R?

I am using this dataset https://archive.ics.uci.edu/ml/datasets/Eco-hotel我正在使用这个数据集https://archive.ics.uci.edu/ml/datasets/Eco-hotel

I am trying to figure out how to count the frequency of certain words like "room" or "vacation" within each column.我试图弄清楚如何计算每列中某些单词(例如“房间”或“假期”)的频率。 I have attempted following tutorials online, but unfortunately, I have had no luck.我曾尝试在线学习教程,但不幸的是,我没有运气。

Using the iris dataset as an example, what you can do is:以 iris 数据集为例,你可以做的是:

library(tidyverse)

iris %>%
  summarize(across(everything(), ~ sum(str_detect(., 'setosa'))))

Of course, you'd need to change the seqrch term to what you need.当然,您需要将 seqrch 术语更改为您需要的内容。

If you want to have dedicated columns for each of your search patterns, you could alternatively do sth.如果您想为每个搜索模式设置专用列,您也可以这样做。 like:喜欢:

df <- data.frame(x = sample(letters, 10, replace = TRUE),
                 y = sample(letters, 10, replace = TRUE))

df |> 
  summarize(across(c(x, y), ~sum(str_count(., c("u"))), .names = "{.col}_u"),
            across(c(x, y), ~sum(str_count(., c("g"))), .names = "{.col}_g"))

Here I'M searching for letters "u" and "g", respectively.在这里,我分别搜索字母“u”和“g”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM