[英]Create new column with conditional group sums of another dataframe in R
Let me illustrate my question with an example:让我用一个例子来说明我的问题:
Sample data:样本数据:
df<-data.frame(BirthYear = c(1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005), Number= c(1,1,1,1,1,1,1,1,1,1,1), Group = c("g", "g", "g", "g", "g", "g","t","t","t","t","t"))
df
BirthYear Number Group
1 1995 1 g
2 1996 1 g
3 1997 1 g
4 1998 1 g
5 1999 1 g
6 2000 1 g
7 2001 1 t
8 2002 1 t
9 2003 1 t
10 2004 1 t
11 2005 1 t
and和
df1<- structure(list(Year = c(2015, 2016, 2017, 2018, 2019, 2020)), class = "data.frame", row.names = c(NA,
-6L))
df1
Year
1 2015
2 2016
3 2017
4 2018
5 2019
6 2020
Now I want to add new columns to df1
: g1, g2, t1 and t2.现在我想向
df1
添加新列:g1、g2、t1 和 t2。 g1 and t1 respectively represent the sum of df$Number
for all instances of a group (g or t in df) where df1$Year - df$BirthYear
is greater than 18 and lower than 21, so basically if someone is in the age between 19 & 20. g2 and t2 represent the sum of df$Number
for all instances of a group where the difference in years is lower than 19. g1 和 t1 分别代表一个组的所有实例的
df$Number
的总和( df$Number
中的 g 或 t),其中df1$Year - df$BirthYear
大于 18 且小于 21,所以基本上如果有人在19 和 20。g2 和 t2 表示年差小于 19 的组的所有实例的df$Number
总和。
I want to end up with the following:我想最终得到以下结果:
df1
Year g1 g2 t1 t2
1 2015 2 4 0 5
2 2016 2 3 0 5
3 2017 2 2 0 5
4 2018 2 1 0 5
5 2019 2 0 0 5
6 2020 1 0 1 4
I know I could make a for-loop over df1
to create the new columns but I don't know how to specify the condition to get the correct group sums for each year.我知道我可以对
df1
进行 for 循环来创建新列,但我不知道如何指定条件以获取每年正确的组总和。 I hope this example makes clear what I'm trying to achieve.我希望这个例子能说明我想要达到的目标。 I'd be very grateful for any help cause I'm really stuck at this point.
我会非常感谢任何帮助,因为我真的被困在这一点上。
If what you want to do is just to calculate year differences across 2015:2020
and BirthYear
, then you don't have to create a separate dataframe.如果您只想计算
2015:2020
和BirthYear
之间的年份差异,那么您不必创建单独的数据框。 Perhaps just也许只是
library(tidyr)
library(dplyr)
df %>%
expand(Year = 2015:2020, nesting(BirthYear, Number, Group)) %>%
group_by(Year, Group) %>%
summarise(
`1` = sum(between(Year - BirthYear, 19, 20) * Number),
`2` = sum((Year - BirthYear < 19) * Number)
) %>%
pivot_wider(names_from = "Group", values_from = c("1", "2"), names_glue = "{Group}{.value}")
Output输出
`summarise()` regrouping output by 'Year' (override with `.groups` argument)
# A tibble: 6 x 5
# Groups: Year [6]
Year g1 t1 g2 t2
<int> <dbl> <dbl> <dbl> <dbl>
1 2015 2 0 4 5
2 2016 2 0 3 5
3 2017 2 0 2 5
4 2018 2 0 1 5
5 2019 2 0 0 5
6 2020 1 1 0 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.