[英]Show empty groups when cutting the data using cut function in R
I have a dataframe like this我有一个像这样的 dataframe
gender <- c("m","m","m","m","m","f","f","f","f","f")
age <- c(18,28,39,49,3,
13,16,6,19,37)
df <- data.frame(gender,age,stringsAsFactors = F)
I am trying to create an ageband
column with groups of 5 from 0-50.我正在尝试创建一个
ageband
列,其中包含 0-50 的 5 组。
df %>%
mutate(ageband = cut( age, breaks = seq(0, 50, 5), right = FALSE)) %>%
group_by(gender, ageband) %>%
mutate(population = 1) %>%
summarize(population = sum(population, na.rm = TRUE))
I get this output我得到这个 output
gender ageband population
1 f [5,10) 1
2 f [10,15) 1
3 f [15,20) 2
4 f [35,40) 1
5 m [0,5) 1
6 m [15,20) 1
7 m [25,30) 1
8 m [35,40) 1
9 m [45,50) 1
This doesn't show me the groups with empty rows.这不会向我显示具有空行的组。 I would like to fill in the empty rows with population = 0.
我想用人口 = 0 填充空行。
My desired output is我想要的 output 是
gender ageband population
1 f [0,5) 0
2 f [5,10) 1
3 f [10,15) 1
4 f [15,20) 2
5 f [20,25) 0
6 f [25,30) 0
7 f [30,35) 0
8 f [35,40) 1
9 f [40,45) 0
10 f [45,50) 0
11 m [0,5) 1
12 m [5,10) 0
13 m [10,15) 0
14 m [15,20) 1
15 m [20,25) 0
16 m [25,30) 1
17 m [30,35) 0
18 m [35,40) 1
19 m [40,45) 0
20 m [45,50) 1
I tried doing it this way but not quite working我试过这样做,但不太好用
df %>%
mutate(ageband = cut( age, breaks = seq(0, 50, 5), right = FALSE)) %>%
group_by(gender, ageband) %>%
mutate(population = 1) %>%
summarize(population = sum(population, na.rm = TRUE)) %>%
mutate(population = coalesce(population, 0L))
Can someone point me in the right direction?有人可以指出我正确的方向吗?
With the addition of tidyr
, you can do:添加
tidyr
,您可以执行以下操作:
df %>%
mutate(ageband = cut(age, breaks = seq(0, 50, 5), right = FALSE)) %>%
count(gender, ageband) %>%
complete(ageband, nesting(gender), fill = list(n = 0)) %>%
arrange(gender, ageband)
ageband gender n
<fct> <chr> <dbl>
1 [0,5) f 0
2 [5,10) f 1
3 [10,15) f 1
4 [15,20) f 2
5 [20,25) f 0
6 [25,30) f 0
7 [30,35) f 0
8 [35,40) f 1
9 [40,45) f 0
10 [45,50) f 0
11 [0,5) m 1
12 [5,10) m 0
13 [10,15) m 0
14 [15,20) m 1
15 [20,25) m 0
16 [25,30) m 1
17 [30,35) m 0
18 [35,40) m 1
19 [40,45) m 0
20 [45,50) m 1
Avoiding packages you can do避免你可以做的包裹
df$ageband <- cut(df$age, breaks=seq(0, 50, 5), right=FALSE)
res <- transform(merge(df, expand.grid(ageband=levels(df$ageband),
gender=unique(df$gender)), all=TRUE),
population=ave(age, gender, ageband, FUN=function(x)
sum(!is.na(x))))[-3]
res
# gender ageband population
# 1 f [0,5) 0
# 2 f [5,10) 1
# 3 f [10,15) 1
# 4 f [15,20) 2
# 5 f [15,20) 2
# 6 f [20,25) 0
# 7 f [25,30) 0
# 8 f [30,35) 0
# 9 f [35,40) 1
# 10 f [40,45) 0
# 11 f [45,50) 0
# 12 m [0,5) 1
# 13 m [5,10) 0
# 14 m [10,15) 0
# 15 m [15,20) 1
# 16 m [20,25) 0
# 17 m [25,30) 1
# 18 m [30,35) 0
# 19 m [35,40) 1
# 20 m [40,45) 0
# 21 m [45,50) 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.