[英]frequency table and group by multiple variables in r
Folks, I need an elegant way of creating frequency count and group by multiple variables. 伙计们,我需要一种优雅的方式来创建频率计数并按多个变量分组。 Output should be a dataframe.
输出应该是一个数据框。 I know the answer lies somewhere in using dplyr and data.table which I am still learning.
我知道答案就在于我仍在学习使用dplyr和data.table。 I tried this link but I want to do this using dplyr and data.table.
我尝试了此链接,但我想使用dplyr和data.table进行此操作。
Here is the sample data from the same link - 这是来自同一链接的示例数据-
ID <- seq(1:177)
Age <- sample(c("0-15", "16-29", "30-44", "45-64", "65+"), 177, replace = TRUE)
Sex <- sample(c("Male", "Female"), 177, replace = TRUE)
Country <- sample(c("England", "Wales", "Scotland", "N. Ireland"), 177, replace = TRUE)
Health <- sample(c("Poor", "Average", "Good"), 177, replace = TRUE)
Survey <- data.frame(Age, Sex, Country, Health)
Here is the output I am looking for. 这是我正在寻找的输出。 Thanks and appreciate your help!
感谢并感谢您的帮助!
We can use dcast
from data.table
我们可以使用
dcast
的data.table
library(data.table)
dcast(setDT(Survey), Age + Sex ~Health, value.var = "Country",
length)[, Total := Average + Good + Poor][]
If we don't want to type the column names, use Reduce
with +
如果我们不想输入列名,请使用带有
+
Reduce
dcast(setDT(Survey), Age + Sex ~Health, value.var = "Country",
length)[, Total := Reduce(`+`, .SD), .SDcols = Average:Poor][]
Here is a method using data.table
and tidyr
but not dcast
. 这是使用
data.table
和tidyr
而不是dcast
。 First, you count observations with .N
in j
by the variables of interest 首先,通过关注变量对
j
带有.N
观测值进行计数
Survey[, .N, by=.(Age, Sex, Health)]
returning: 返回:
Age Sex Health N
30-44 Female Average 10
65+ Female Poor 9
0-15 Male Average 3
16-29 Male Average 6
30-44 Male Good 6
45-64 Female Average 8
Then, use spread
from tidyr
to turn your column of choice into a set of new columns (one for each unique value) populated by N
然后,使用
spread
从tidyr
把你所选择的列到由填充了一组新的列(每个唯一值) N
spread(Survey[, .N, by=.(Age, Sex, Health)], Health, N)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.