繁体   English   中英

如何对 R 中的数据集的值进行重新分类并聚合行?

[英]How do I recategorize values and aggregate rows of a dataset in R?

我需要聚合数据集的行以折叠年龄范围。 我的数据集目前有 5 岁的年龄范围。 我试图将这些年龄范围组合成类别,同时对一些变量(人口、X1、X2、X3 和 X4)求和,同时保持变量“类别”,该变量对于该特定 ID 中的每一行都是相同的。

我的数据集如下所示:

ID    Age.Range    Population   X1   X2   X3   X4   Category
1     05-09 years  10           1    0    0    1    a
1     10-14 years  20           0    0    1    0    a
1     30-34 years  10           0    0    1    0    a
1     40-44 years  15           2    0    0    1    a
2     05-09 years  15           1    1    0    2    b
2     25-29 years  10           0    0    0    0    b
3     10-14 years  15           0    1    2    0    a
3     15-19 years  10           1    0    0    1    a
3     20-24 years  15           0    0    1    3    a
3     30-34 years  20           0    0    1    0    a
3     35-39 years  10           0    1    0    0    a

我正在尝试生成一个新的 dataframe,它结合了年龄,以便我的新年龄范围为 05-29 岁、30-39 岁和 40-49 岁,所以它看起来像这样:

ID    Age.Range    Population   X1   X2   X3   X4   Category
1     05-29 years  30           1    0    1    1    a
1     30-39 years  10           0    0    1    0    a
1     40-49 years  15           2    0    0    1    a
2     05-29 years  25           1    1    0    2    a
3     05-29 years  40           1    1    3    4    a
3     30-39 years  30           0    1    1    0    a

我试过用 dplyr 这样做但没有成功。 任何帮助,将不胜感激!

这应该有效:

your_data %>%
  mutate(
    First.Age.In.Range = as.numeric(str_extract(Age.Range, "^[0-9]+"))
    New.Age.Range = case_when(
      First.Age.In.Range < 30 ~ "05-29 years",
      First.Age.In.Range < 40 ~ "30-39 years",
      First.Age.In.Range < 50 ~ "40-49 years",
      First.Age.In.Range < 60 ~ "50-59 years",    
      ## not sure how high you need to go 
      ## catch-all for the last category
      TRUE ~ "90-99 years"
    )
  ) %>%
  group_by(ID, New.Age.Range, Population, Category) %>%
  summarize(across(starts_with("X"), sum))

这是一个使用tidyrstringrdplyr包的解决方案。 它类似于 Gregor Thomas 提供的内容。 在我们等待添加编辑时,它还让其他人有机会与可重现的示例进行交互。

df <- structure(list(ID = c(1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3), Age.Range = c("05-09 years", 
"10-14 years", "30-34 years", "40-44 years", "05-09 years", "25-29 years", 
"10-14 years", "15-19 years", "20-24 years", "30-34 years", "35-39 years"
), Population = c(10L, 20L, 10L, 15L, 15L, 10L, 15L, 10L, 15L, 
20L, 10L), X1 = c(1L, 0L, 0L, 2L, 1L, 0L, 0L, 1L, 0L, 0L, 0L), 
    X2 = c(0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L), X3 = c(0L, 
    1L, 1L, 0L, 0L, 0L, 2L, 0L, 1L, 1L, 0L), X4 = c(1L, 0L, 0L, 
    1L, 2L, 0L, 0L, 1L, 3L, 0L, 0L), Category = c("a", "a", "a", 
    "a", "b", "b", "a", "a", "a", "a", "a")), class = "data.frame", row.names = c(NA, 
-11L))


library(stringr)
library(dplyr)
library(tidyr)

df %>% 
  group_by(ID) %>% 
  separate(col = Age.Range, into = c("Age_1", "Age_2"), sep = "-") %>%
  # You will have to add ifelse statements if you have ages that are >49 in your dataset. 
  mutate(
    Age_2 = str_remove(Age_2, " years"),
    Age_1 = ifelse(Age_2 <= 29, "05-29 years", Age_1),
    Age_1 = ifelse(Age_2 > 29 & Age_2 <= 39, "30-39 years", Age_1),
    Age_1 = ifelse(Age_2 > 39 & Age_2 <= 49, "40-49 years", Age_1)
  ) %>%
  rename(Age.Range = Age_1) %>% 
  group_by(ID, Category, Age.Range) %>% 
  summarise(across(
    .cols = Population:X4, sum
  )) %>% 
  select(ID, Age.Range, Population, X1, X2, X3, X4, Category)


#> # A tibble: 6 x 8
#> # Groups:   ID, Category [3]
#>      ID Age.Range   Population    X1    X2    X3    X4 Category
#>   <dbl> <chr>            <int> <int> <int> <int> <int> <chr>   
#> 1     1 05-29 years         30     1     0     1     1 a       
#> 2     1 30-39 years         10     0     0     1     0 a       
#> 3     1 40-49 years         15     2     0     0     1 a       
#> 4     2 05-29 years         25     1     1     0     2 b       
#> 5     3 05-29 years         40     1     1     3     4 a       
#> 6     3 30-39 years         30     0     1     1     0 a

reprex package (v0.3.0) 创建于 2020-11-15

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM