使用 R 中另一个数据框的条件组总和创建新列

Question

Let me illustrate my question with an example:让我用一个例子来说明我的问题：

Sample data:样本数据：

df<-data.frame(BirthYear = c(1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005), Number= c(1,1,1,1,1,1,1,1,1,1,1), Group = c("g", "g", "g", "g", "g", "g","t","t","t","t","t"))

df 
 BirthYear Number  Group 
1  1995     1       g
2  1996     1       g 
3  1997     1       g
4  1998     1       g
5  1999     1       g
6  2000     1       g
7  2001     1       t
8  2002     1       t
9  2003     1       t
10 2004     1       t
11 2005     1       t

and和

df1<- structure(list(Year = c(2015, 2016, 2017, 2018, 2019, 2020)), class = "data.frame", row.names = c(NA, 
-6L))

df1
   Year
1  2015
2  2016
3  2017
4  2018 
5  2019
6  2020

Now I want to add new columns to df1 : g1, g2, t1 and t2.现在我想向df1添加新列：g1、g2、t1 和 t2。 g1 and t1 respectively represent the sum of df$Number for all instances of a group (g or t in df) where df1$Year - df$BirthYear is greater than 18 and lower than 21, so basically if someone is in the age between 19 & 20. g2 and t2 represent the sum of df$Number for all instances of a group where the difference in years is lower than 19. g1 和 t1 分别代表一个组的所有实例的df$Number的总和（ df$Number中的 g 或 t），其中df1$Year - df$BirthYear大于 18 且小于 21，所以基本上如果有人在19 和 20。g2 和 t2 表示年差小于 19 的组的所有实例的df$Number总和。

I want to end up with the following:我想最终得到以下结果：

df1
   Year   g1   g2  t1   t2 
1  2015   2    4    0    5
2  2016   2    3    0    5
3  2017   2    2    0    5
4  2018   2    1    0    5
5  2019   2    0    0    5
6  2020   1    0    1    4

I know I could make a for-loop over df1 to create the new columns but I don't know how to specify the condition to get the correct group sums for each year.我知道我可以对df1进行 for 循环来创建新列，但我不知道如何指定条件以获取每年正确的组总和。 I hope this example makes clear what I'm trying to achieve.我希望这个例子能说明我想要达到的目标。 I'd be very grateful for any help cause I'm really stuck at this point.我会非常感谢任何帮助，因为我真的被困在这一点上。

Answer 1

If what you want to do is just to calculate year differences across 2015:2020 and BirthYear , then you don't have to create a separate dataframe.如果您只想计算2015:2020和BirthYear之间的年份差异，那么您不必创建单独的数据框。 Perhaps just也许只是

library(tidyr)
library(dplyr)
df %>% 
  expand(Year = 2015:2020, nesting(BirthYear, Number, Group)) %>% 
  group_by(Year, Group) %>% 
  summarise(
    `1` = sum(between(Year - BirthYear, 19, 20) * Number), 
    `2` = sum((Year - BirthYear < 19) * Number)
  ) %>% 
  pivot_wider(names_from = "Group", values_from = c("1", "2"), names_glue = "{Group}{.value}")

Output输出

`summarise()` regrouping output by 'Year' (override with `.groups` argument)
# A tibble: 6 x 5
# Groups:   Year [6]
   Year    g1    t1    g2    t2
  <int> <dbl> <dbl> <dbl> <dbl>
1  2015     2     0     4     5
2  2016     2     0     3     5
3  2017     2     0     2     5
4  2018     2     0     1     5
5  2019     2     0     0     5
6  2020     1     1     0     4

使用 R 中另一个数据框的条件组总和创建新列

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-10-20 17:32:49

使用 R 中另一个数据框的条件组总和创建新列

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-10-20 17:32:49

解决方案1
1 已采纳 2020-10-20 17:32:49