简体   繁体   English

使用 R 中另一个数据框的条件组总和创建新列

[英]Create new column with conditional group sums of another dataframe in R

Let me illustrate my question with an example:让我用一个例子来说明我的问题:

Sample data:样本数据:

df<-data.frame(BirthYear = c(1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005), Number= c(1,1,1,1,1,1,1,1,1,1,1), Group = c("g", "g", "g", "g", "g", "g","t","t","t","t","t"))

df 
 BirthYear Number  Group 
1  1995     1       g
2  1996     1       g 
3  1997     1       g
4  1998     1       g
5  1999     1       g
6  2000     1       g
7  2001     1       t
8  2002     1       t
9  2003     1       t
10 2004     1       t
11 2005     1       t 

and

df1<- structure(list(Year = c(2015, 2016, 2017, 2018, 2019, 2020)), class = "data.frame", row.names = c(NA, 
-6L))

df1
   Year
1  2015
2  2016
3  2017
4  2018 
5  2019
6  2020

Now I want to add new columns to df1 : g1, g2, t1 and t2.现在我想向df1添加新列:g1、g2、t1 和 t2。 g1 and t1 respectively represent the sum of df$Number for all instances of a group (g or t in df) where df1$Year - df$BirthYear is greater than 18 and lower than 21, so basically if someone is in the age between 19 & 20. g2 and t2 represent the sum of df$Number for all instances of a group where the difference in years is lower than 19. g1 和 t1 分别代表一个组的所有实例的df$Number的总和( df$Number中的 g 或 t),其中df1$Year - df$BirthYear大于 18 且小于 21,所以基本上如果有人在19 和 20。g2 和 t2 表示年差小于 19 的组的所有实例的df$Number总和。

I want to end up with the following:我想最终得到以下结果:

df1
   Year   g1   g2  t1   t2 
1  2015   2    4    0    5
2  2016   2    3    0    5
3  2017   2    2    0    5
4  2018   2    1    0    5
5  2019   2    0    0    5
6  2020   1    0    1    4

I know I could make a for-loop over df1 to create the new columns but I don't know how to specify the condition to get the correct group sums for each year.我知道我可以对df1进行 for 循环来创建新列,但我不知道如何指定条件以获取每年正确的组总和。 I hope this example makes clear what I'm trying to achieve.我希望这个例子能说明我想要达到的目标。 I'd be very grateful for any help cause I'm really stuck at this point.我会非常感谢任何帮助,因为我真的被困在这一点上。

If what you want to do is just to calculate year differences across 2015:2020 and BirthYear , then you don't have to create a separate dataframe.如果您只想计算2015:2020BirthYear之间的年份差异,那么您不必创建单独的数据框。 Perhaps just也许只是

library(tidyr)
library(dplyr)
df %>% 
  expand(Year = 2015:2020, nesting(BirthYear, Number, Group)) %>% 
  group_by(Year, Group) %>% 
  summarise(
    `1` = sum(between(Year - BirthYear, 19, 20) * Number), 
    `2` = sum((Year - BirthYear < 19) * Number)
  ) %>% 
  pivot_wider(names_from = "Group", values_from = c("1", "2"), names_glue = "{Group}{.value}")

Output输出

`summarise()` regrouping output by 'Year' (override with `.groups` argument)
# A tibble: 6 x 5
# Groups:   Year [6]
   Year    g1    t1    g2    t2
  <int> <dbl> <dbl> <dbl> <dbl>
1  2015     2     0     4     5
2  2016     2     0     3     5
3  2017     2     0     2     5
4  2018     2     0     1     5
5  2019     2     0     0     5
6  2020     1     1     0     4

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何创建一个新列,该列从 R 中的另一列连续求和? - How to create a new column that consecutively sums from another column in R? 在 dataframe 中创建新列,条件是 R 中另一个列值的总和 - Creating new columns in dataframe conditional on the sum of another column value in R 通过在 R 中成对 dataframe 中的两列中的条件匹配创建一个新列 - Create a new column by match conditional in two columns in pairwise dataframe in R 根据另一列 R 的条件语句创建新列 - Create a new column based on conditional statement of another column R R-在数据框中将条件总和创建为新列的更快方法 - R - Faster way of creating conditional sums as new column in data frame 在R中创建条件总和(基于日期)作为数据框的新列 - Creating conditional sums (based on dates) as new column of data frame in R 基于另一个在 dataframe 中创建新列,并与 R 中的另一个数据集匹配 - Create new column in dataframe based on another and matching to another dataset in R 在数据框的底部创建一个新行并添加列总和 - Create a new row at the bottom of dataframe and add column sums 如何基于一个数据框中的列的值和R中另一个数据框的列标题名称有条件地创建新列 - how to conditionally create new column based on the values of a column in one dataframe and the column header names of another dataframe in R 新数据帧,其总和基于另一个数据帧的条件 - New dataframe with sums based on conditions of another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM