简体   繁体   English

如何使用新组的总和创建新观察?

[英]How to create new observations with sum of a new group?

I have the following dataframe: 我有以下数据帧:

gender age   population
H      0-4   5
H      5-9   5
H      10-14 10
H      15-19 15
H      20-24 15
H      25-29 10
M      0-4   0
M      5-9   5
M      10-14 5
M      15-19 15
M      20-24 10
M      25-29 15

And I need to re-group the age categories in the following dataframe: 我需要在以下数据框中重新分组年龄类别:

gender age   population
H      0-14  20
H      15-19 15
H      20-29 25
M      0-14  10
M      15-19 15
M      20-29 25

I have preference for dplyr, so if have a way to accomplish this using this packages, I appreciate. 我喜欢dplyr,所以如果有办法用这个包完成这个,我很感激。

Using string split - tidyr::separate() and cut() : 使用字符串拆分 - tidyr::separate()cut()

library(dplyr)
library(tidyr)

df1 %>% 
  separate(age, into = c("age1", "age2"), sep = "-", convert = TRUE ) %>% 
  mutate(age = cut(age1,
                     breaks = c(0, 14, 19, 29),
                     labels = c("0-14", "15-19", "20-29"),
                     include.lowest = TRUE)) %>% 
  group_by(gender, age) %>% 
  summarise(population = sum(population))

# output
#   gender  age   population
#   (fctr) (fctr)      (int)
# 1      H   0-14         20
# 2      H  15-19         15
# 3      H  20-29         25
# 4      M   0-14         10
# 5      M  15-19         15
# 6      M  20-29         25

The data.table solution where dat is the table: data.table解决方案,其中dat是表:

library(data.table)
dat <- as.data.table(dat)
dat[ , mn := as.numeric(sapply(strsplit(age, "-"), "[[", 1))]
dat[ , age := cut(mn, breaks = c(0, 14, 19, 29), 
              include.lowest = TRUE, 
              labels = c("0-14", "15-19", "20-29"))]
dat[ , list(population = sum(population)), by = list(gender, age)]
#    gender   age population
# 1:      H  0-14         20
# 2:      H 15-19         15
# 3:      H 20-29         25
# 4:      M  0-14         10
# 5:      M 15-19         15
# 6:      M 20-29         25

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何基于观察组的另一个变量为观察组创建一个新变量 - How do I create a new variable for a group of observations based on another variable specific to that group 根据累积总和和组创建新组 - Create new group based on cumulative sum and group 如何使用现有的虚拟变量创建一个新变量,该变量对组内的某些先导观察值取值 1 - How to use an existing dummy variable to create a new one that takes the value 1 for certain lead observations within a group 如何根据组值的总和创建和添加新变量 - How to create and add a new variable based on the sum of values of a group 面板数据-按组求和并创建新变量 - Panel Data - sum by group and create new variable 如何使用 dplyr 将组中的两个观察结果组合成一个新观察结果 - How do I combine two observations in a group into a new observation with dplyr dplyr,如何根据代码对观察结果进行分组、计数和创建汇总变量,然后根据组内名称添加新变量 - dplyr, how to group observations based on codes, count and create summary variable then add a new variable based on names within the groups 按组对多个变量求和,并用它们的总和创建新列 - Sum multiple variables by group and create new column with their sum 按组创建新变量Y:如果X在组的前四个观测值之中,则Y = X; 否则Y = 0 - Create new variable Y by group: Y=X, if X is among top four observations in group; else Y=0 作为现有数据集中的新变量的组中观察数 - Number of observations in a group as a new variable in existing dataset
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM