如何根据 R 中的列中的值聚合数据

Question

I am currently working on a project for work and I am struggling to summarize data correctly and I am worried that I am approaching this problem the wrong way.我目前正在做一个工作项目，我正在努力正确地总结数据，我担心我以错误的方式处理这个问题。 Basically, I have a dataset that looks like this:基本上，我有一个看起来像这样的数据集：

Month.Year Code Count
8/2017     1    1 
2/2018     1    1
4/2018     2    1
4/2018     2    1
5/2020     3    1
5/2020     3    1
.
.
.

I need to summarize this data so that I can create grouped barplots with dates being the groups and the codes being the subgroups.我需要总结这些数据，以便我可以创建分组条形图，其中日期是组，代码是子组。

In this data set we have a date column by Month/Year, a Categorical Code (a value between 1 and 3), and a "Count" column that I created which is just the value 1 for each observation (I'm hoping this makes it easier to "sum" the number of obs).在这个数据集中，我们有一个按月/年划分的日期列、一个分类代码（一个介于 1 和 3 之间的值）和一个我创建的“计数”列，它只是每个观察值的 1（我希望这个使“求和”obs 的数量更容易）。

The goal is to summarize this data at a Month and Code level for each year.目标是在每年的月份和代码级别汇总此数据。 In other words, I would like to have a different dataset for each year that looks something like this:换句话说，我希望每年都有一个不同的数据集，看起来像这样：

## Dataset for Year 2018
Month Code Value
1     1    24  
1     2    13  
1     3    0
2     1    0
2     2    5
2     3    22
.
.
.
## Dataset for Year 2019
Month Code Value
1     1    15  
1     2    2  
1     3    54
2     1    0
2     2    0
2     3    21
.
.
.

Answer 1

split the data set by year and then aggregate each sub-data.frame in a lapply loop.按年份split数据集，然后在lapply循环中aggregate每个子 data.frame。

Use sub to keep only the year to be used in the split instruction.使用sub只保留要在split指令中使用的年份。

df1 <- read.table(text = "
Month.Year Code Count
8/2017     1    1 
2/2018     1    1
4/2018     2    1
4/2018     2    1
5/2020     3    1
5/2020     3    1
", header = TRUE)
df1
#>   Month.Year Code Count
#> 1     8/2017    1     1
#> 2     2/2018    1     1
#> 3     4/2018    2     1
#> 4     4/2018    2     1
#> 5     5/2020    3     1
#> 6     5/2020    3     1

sub(".*/", "", df1$Month.Year)
#> [1] "2017" "2018" "2018" "2018" "2020" "2020"

^{Created on 2022-03-07 by the reprex package (v2.0.1)}^{由reprex package (v2.0.1) 创建于 2022-03-07}

Now save the split result and loop to compute the sums.现在保存split结果并循环计算总和。

df1_year <- split(df1, sub(".*/", "", df1$Month.Year))
df1_year <- lapply(df1_year, \(x) {
  x$Month.Year <- sub("/\\d+$", "", x$Month.Year)
  names(x)[1] <- "Month"
  aggregate(Count ~ ., data = x, sum)
})

df1_year
#> $`2017`
#>   Month Code Count
#> 1     8    1     1
#> 
#> $`2018`
#>   Month Code Count
#> 1     2    1     1
#> 2     4    2     2
#> 
#> $`2020`
#>   Month Code Count
#> 1     5    3     2

^{Created on 2022-03-07 by the reprex package (v2.0.1)}^{由reprex package (v2.0.1) 创建于 2022-03-07}

The result list members can be extracted with the standard extraction operators.可以使用标准提取运算符提取结果列表成员。

df_year[['2017']]   # by quoted name, '2017'
df_year[[1]]        # equivalent, 1st member

如何根据 R 中的列中的值聚合数据

问题描述

1 个解决方案

解决方案1
0 2022-03-07 17:00:03

如何根据 R 中的列中的值聚合数据

问题描述

1 个解决方案

解决方案1 0 2022-03-07 17:00:03

解决方案1
0 2022-03-07 17:00:03