[英]Using numerical variables as by group to get summary statistics
I have data as follows:我有如下数据:
library(data.table)
dat <- fread("total women young
1 0 0
1 1 1
1 0 1
2 1 1
2 2 1
2 2 1
3 1 2
3 2 3
3 2 3
4 4 2
4 4 3
4 3 3
5 5 2
5 2 3
5 5 3
10 4 2
10 4 3
20 5 3
100 10 20")
I would like to create six categories for the variable tot_num
:我想为变量tot_num
创建六个类别:
1,2,3,4,5 and over 5.
I would like to count the observations per category total
in count
.我想在count
中计算每个类别的观察total
。 sum_tot
would simply be these multiplied. sum_tot
就是这些的乘积。 And women
and young
are the average amount of women and young people in that group. women
和young
是该群体中女性和年轻人的平均数量。
Desired output所需 output
total count sum_tot_count women young
1 3 3 0.33 0.66
2 3 6 5/6 0.5
3 3 9 5/9 8/9
4 3 12 11/12 10/12
5 3 15 12/15 8/15
over 5 4 140 23/140 28/140
I am having some trouble figuring out where to start.我在弄清楚从哪里开始时遇到了一些麻烦。
Could someone lead me on the right track?有人可以引导我走上正确的轨道吗?
Does this work:这行得通吗:
library(dplyr)
dat %>% mutate(tot = if_else(total > 5, 'over 5', as.character(total))) %>%
group_by(tot) %>% summarise(count = n(), sum_tot_count = sum(total), women = sum(women)/sum(total), young = sum(young)/sum(total))
# A tibble: 6 × 5
tot count sum_tot_count women young
<chr> <int> <int> <dbl> <dbl>
1 1 3 3 0.333 0.667
2 2 3 6 0.833 0.5
3 3 3 9 0.556 0.889
4 4 3 12 0.917 0.667
5 5 3 15 0.8 0.533
6 over 5 4 140 0.164 0.2
With cut
:随着cut
:
dat %>%
group_by(cutGroup = cut(total, breaks = c(1:6, Inf), labels = c(1:5, "over 5"), include.lowest = TRUE, right = FALSE)) %>%
summarise(count = n(),
sum_tot_count = sum(total),
women = sum(women) / sum(total),
young = sum(young) / sum(total))
A data.table
solution. data.table
解决方案。 The key is using cut()
, as in other answers;关键是使用cut()
,就像其他答案一样; after that, basic data.table syntax as in Use data.table to count and aggregate / summarize a column will get you the rest of the way:之后, 使用 data.table 中的基本 data.table 语法来计算和聚合/汇总列将为您提供 rest 的方式:
dat[, cat := cut(total, breaks = 0.5 + c(0:5,Inf), labels = c(1:5, "over 5"))]
.(count = n())]
dat[,.(count=.N,
total = sum(total),
women = sum(women)/sum(total),
young = sum(young)/sum(total)),
by = cat]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.