[英]Summary statistics for each category of categorical variables in R
I would like to report the summary statistics for a few variables of which some are categorical varibles.我想报告一些变量的汇总统计数据,其中一些是分类变量。 I want the the individual categories listed seperately and with the indiviudal frequency.我希望单独列出各个类别并以单独的频率列出。 I can not find any other post or explanations elsewhere, therefore I turn to stack overflow as my last resort.我在别处找不到任何其他帖子或解释,因此我将堆栈溢出作为最后的手段。
This is the closest I can find to my problem, but it does not quite address the issue: Obtaining Separate Summary Statistics by Categorical Variable with Stargazer Package这是我能找到的最接近我的问题,但它并没有完全解决问题: 使用 Stargazer Package 按分类变量获取单独的摘要统计信息
currently the table looks like this:目前该表如下所示:
I would like it to look more like this with regard to lisitng all values for the categorical variables but with the correct numbers of observations:我希望它在列出分类变量的所有值但具有正确的观察数量方面看起来更像这样:
The code I used我使用的代码
sumstats <- demographics %>% select(treated, pandl_r2, pandl_r3, age, gender,
education, crypto.posession, crypto.knowledge,
trading.frequency)
stargazer(sumstats, out = "...")
dput looks like this dput 看起来像这样
structure(list(ï..treated = c(0L, 0L, 0L, 0L, 0L, 0L), pandl_r2 = c(104.22,
93.83, -45.8, 175, 240.4, 120.4), pandl_r3 = c(63.7, 335.58,
-147.55, 165.55, 429.35, -48.03), treated = c(0L, 0L, 0L, 0L,
0L, 0L), age = c(22L, 26L, 30L, 23L, 23L, 25L), gender = c(1L,
1L, 5L, 2L, 2L, 2L), education = c(2L, 5L, 4L, 4L, 4L, 4L), crypto.posession = c(1L,
1L, 1L, 1L, 3L, 1L), crypto.knowledge = c(7L, 6L, 3L, 10L, 5L,
6L), trading.frequency = c(3L, 8L, 2L, 5L, 1L, 3L)), row.names = c(NA,
6L), class = "data.frame")
here head 10这里头 10
ï..treated pandl_r2 pandl_r3 treated age gender education crypto.posession crypto.knowledge trading.frequency
1 0 104.22 63.70 0 22 1 2 1 7 3
2 0 93.83 335.58 0 26 1 5 1 6 8
3 0 -45.80 -147.55 0 30 5 4 1 3 2
4 0 175.00 165.55 0 23 2 4 1 10 5
5 0 240.40 429.35 0 23 2 4 3 5 1
6 0 120.40 -48.03 0 25 2 4 1 6 3
7 0 478.00 375.02 0 20 1 3 1 7 3
8 0 131.59 15.31 0 19 1 3 1 7 1
9 0 170.70 331.02 0 24 1 4 2 2 0
10 0 66.11 290.15 0 19 1 3 2 4 2
11 0 210.50 661.95 0 24 1 4 1 8 0
12 0 388.09 641.15 0 39 2 4 3 5 2
13 0 426.72 812.00 0 24 2 2 2 3 4
14 0 393.39 577.45 0 33 1 4 1 10 10
15 0 104.73 776.56 0 25 1 3 1 8 5
16 0 248.78 106.75 0 30 2 4 1 8 7
17 0 462.40 503.55 0 33 1 4 1 8 8
18 0 245.90 550.75 0 25 1 4 1 8 9
19 0 413.60 288.55 0 33 5 4 1 5 5
20 0 316.40 477.15 0 32 1 3 1 6 2
Every hint is appreciated, many thanks in advance感谢每一个提示,非常感谢
I'm assuming you want each categoric approached separately rather than in combination.我假设您希望单独而不是组合处理每个类别。 You could start with你可以从
library(SmartEDA)
library(purrr)
map(c("gender","education" ),
~ExpCustomStat(demographics,
Cvar=.x,
Nvar=c("pandl_r2","pandl_r3") ,
stat = c('Count','Prop','mean','min','P0.25','median','p0.75','max'))
)
where nvar has the numeric's to assess and you list out the categories in the first input to the map. if you want all the results stacked you'd have to map the first column to a generic name before stacking like so其中 nvar 具有要评估的数字,您在 map 的第一个输入中列出了类别。如果您想要堆叠所有结果,您必须将 map 第一列设置为通用名称,然后像这样堆叠
library(dplyr)
map_dfr(c("gender","education" ),
~ExpCustomStat(demographics,
Cvar=.x,
Nvar=c("pandl_r2","pandl_r3") ,
stat = c('Count','Prop','mean','min','P0.25','median','p0.75','max')) |>
rename_at(1, \(x)"var") |> mutate(catname = .x) |> relocate(catname)
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.