[英]How to create new observations with sum of a new group?
I have the following dataframe: 我有以下数据帧:
gender age population
H 0-4 5
H 5-9 5
H 10-14 10
H 15-19 15
H 20-24 15
H 25-29 10
M 0-4 0
M 5-9 5
M 10-14 5
M 15-19 15
M 20-24 10
M 25-29 15
And I need to re-group the age categories in the following dataframe: 我需要在以下数据框中重新分组年龄类别:
gender age population
H 0-14 20
H 15-19 15
H 20-29 25
M 0-14 10
M 15-19 15
M 20-29 25
I have preference for dplyr, so if have a way to accomplish this using this packages, I appreciate. 我喜欢dplyr,所以如果有办法用这个包完成这个,我很感激。
Using string split - tidyr::separate()
and cut()
: 使用字符串拆分 - tidyr::separate()
和cut()
:
library(dplyr)
library(tidyr)
df1 %>%
separate(age, into = c("age1", "age2"), sep = "-", convert = TRUE ) %>%
mutate(age = cut(age1,
breaks = c(0, 14, 19, 29),
labels = c("0-14", "15-19", "20-29"),
include.lowest = TRUE)) %>%
group_by(gender, age) %>%
summarise(population = sum(population))
# output
# gender age population
# (fctr) (fctr) (int)
# 1 H 0-14 20
# 2 H 15-19 15
# 3 H 20-29 25
# 4 M 0-14 10
# 5 M 15-19 15
# 6 M 20-29 25
The data.table
solution where dat
is the table: data.table
解决方案,其中dat
是表:
library(data.table)
dat <- as.data.table(dat)
dat[ , mn := as.numeric(sapply(strsplit(age, "-"), "[[", 1))]
dat[ , age := cut(mn, breaks = c(0, 14, 19, 29),
include.lowest = TRUE,
labels = c("0-14", "15-19", "20-29"))]
dat[ , list(population = sum(population)), by = list(gender, age)]
# gender age population
# 1: H 0-14 20
# 2: H 15-19 15
# 3: H 20-29 25
# 4: M 0-14 10
# 5: M 15-19 15
# 6: M 20-29 25
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.