简体   繁体   English

在 R 中使用 cut function 切割数据时显示空组

[英]Show empty groups when cutting the data using cut function in R

I have a dataframe like this我有一个像这样的 dataframe

gender <- c("m","m","m","m","m","f","f","f","f","f")
age <- c(18,28,39,49,3,
         13,16,6,19,37)

df <- data.frame(gender,age,stringsAsFactors = F) 

I am trying to create an ageband column with groups of 5 from 0-50.我正在尝试创建一个ageband列,其中包含 0-50 的 5 组。

df %>%
  mutate(ageband = cut( age, breaks = seq(0, 50, 5), right = FALSE)) %>%
  group_by(gender, ageband) %>%
  mutate(population = 1)  %>%
  summarize(population = sum(population, na.rm = TRUE)) 

I get this output我得到这个 output

 gender ageband population
1 f      [5,10)           1
2 f      [10,15)          1
3 f      [15,20)          2
4 f      [35,40)          1
5 m      [0,5)            1
6 m      [15,20)          1
7 m      [25,30)          1
8 m      [35,40)          1
9 m      [45,50)          1

This doesn't show me the groups with empty rows.这不会向我显示具有空行的组。 I would like to fill in the empty rows with population = 0.我想用人口 = 0 填充空行。

My desired output is我想要的 output 是

   gender ageband population
1       f   [0,5)          0
2       f  [5,10)          1
3       f [10,15)          1
4       f [15,20)          2
5       f [20,25)          0
6       f [25,30)          0
7       f [30,35)          0
8       f [35,40)          1
9       f [40,45)          0
10      f [45,50)          0
11      m   [0,5)          1
12      m  [5,10)          0
13      m [10,15)          0
14      m [15,20)          1
15      m [20,25)          0
16      m [25,30)          1
17      m [30,35)          0
18      m [35,40)          1
19      m [40,45)          0
20      m [45,50)          1

I tried doing it this way but not quite working我试过这样做,但不太好用

df %>%
  mutate(ageband = cut( age, breaks = seq(0, 50, 5), right = FALSE)) %>%
  group_by(gender, ageband) %>%
  mutate(population = 1)  %>%
  summarize(population = sum(population, na.rm = TRUE)) %>%
  mutate(population = coalesce(population, 0L))

Can someone point me in the right direction?有人可以指出我正确的方向吗?

With the addition of tidyr , you can do:添加tidyr ,您可以执行以下操作:

df %>%
 mutate(ageband = cut(age, breaks = seq(0, 50, 5), right = FALSE)) %>%
 count(gender, ageband) %>%
 complete(ageband, nesting(gender), fill = list(n = 0)) %>%
 arrange(gender, ageband)

  ageband gender     n
   <fct>   <chr>  <dbl>
 1 [0,5)   f          0
 2 [5,10)  f          1
 3 [10,15) f          1
 4 [15,20) f          2
 5 [20,25) f          0
 6 [25,30) f          0
 7 [30,35) f          0
 8 [35,40) f          1
 9 [40,45) f          0
10 [45,50) f          0
11 [0,5)   m          1
12 [5,10)  m          0
13 [10,15) m          0
14 [15,20) m          1
15 [20,25) m          0
16 [25,30) m          1
17 [30,35) m          0
18 [35,40) m          1
19 [40,45) m          0
20 [45,50) m          1

Avoiding packages you can do避免你可以做的包裹

df$ageband <- cut(df$age, breaks=seq(0, 50, 5), right=FALSE)
res <- transform(merge(df, expand.grid(ageband=levels(df$ageband),
                                       gender=unique(df$gender)), all=TRUE),
                 population=ave(age, gender, ageband, FUN=function(x) 
                   sum(!is.na(x))))[-3]
res
#    gender ageband population
# 1       f   [0,5)          0
# 2       f  [5,10)          1
# 3       f [10,15)          1
# 4       f [15,20)          2
# 5       f [15,20)          2
# 6       f [20,25)          0
# 7       f [25,30)          0
# 8       f [30,35)          0
# 9       f [35,40)          1
# 10      f [40,45)          0
# 11      f [45,50)          0
# 12      m   [0,5)          1
# 13      m  [5,10)          0
# 14      m [10,15)          0
# 15      m [15,20)          1
# 16      m [20,25)          0
# 17      m [25,30)          1
# 18      m [30,35)          0
# 19      m [35,40)          1
# 20      m [40,45)          0
# 21      m [45,50)          1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM