[英]R adding a column to one dataframe based on another dataframe and the date
I have a dataframe (Reports_following_AC) where each row represents a report.我有一个 dataframe (Reports_following_AC),其中每一行代表一个报告。 This dataframe looks like this:
这个 dataframe 看起来像这样:
> head(Reports_following_AC)
Park Month Obs_con Coy_Season Number_AC Number_4w_AC
<chr> <date> <dbl> <dbl> <int> <int>
1 14st NE - Coventry 2019-06-14 1 2 8 0
2 14st NE - Coventry 2019-10-12 0 3 10 0
3 14st NE - Coventry 2019-10-13 0 3 10 0
4 14st NE - Coventry 2021-06-23 1 2 10 0
5 Airways Park 2020-07-05 0 2 3 0
6 Airways Park 2021-07-18 1 2 6 0
I would like to add a column to my Reports_following_AC dataframe, "Last_treatment", based on the "AC_code" column of the Reaction_per_park_per_day_3 dataframe (below).我想根据 Reaction_per_park_per_day_3 dataframe(下方)的“AC_code”列向我的 Reports_following_AC dataframe 添加一列“Last_treatment”。 In my Reaction_per_park_per_day_3 dataframe, each row represents an AC event.
在我的 Reaction_per_park_per_day_3 dataframe 中,每一行代表一个 AC 事件。
The Last_treatment column that would be added to the Reports_following_AC dataframe would represent the "AC_code" (treatment) of the last AC event prior to a report in a Park, if that AC event was done in the 4 weeks (28 days) prior to a report.将添加到 Reports_following_AC dataframe 的 Last_treatment 列将代表公园报告之前最后一次 AC 事件的“AC_code”(治疗),如果该 AC 事件是在 4 周(28 天)之前完成的报告。
> head(Reaction_per_park_per_day_3)
# A tibble: 6 x 10
Park Date AC_code
<chr> <date> <dbl>
1 14st NE - Coventry 2019-06-05 6
2 14st NE - Coventry 2019-07-12 7
3 14st NE - Coventry 2019-10-05 1
4 14st NE - Coventry 2021-06-18 2
5 Airways Park 2020-06-26 1
6 Airways Park 2021-06-30 5
The resulting dataframe would therefore look like this:因此,生成的 dataframe 将如下所示:
Park Month Obs_con Coy_Season Number_AC Number_4w_AC Last_treatment
<chr> <date> <dbl> <dbl> <int> <int> <dbl>
1 14st NE - Coventry 2019-06-14 1 2 8 0 6
2 14st NE - Coventry 2019-10-12 0 3 10 0 1
3 14st NE - Coventry 2019-10-13 0 3 10 0 1
4 14st NE - Coventry 2021-06-23 1 2 10 0 NA
5 Airways Park 2020-07-05 0 2 3 0 1
6 Airways Park 2021-07-18 1 2 6 0 5
I tried the following code, but it's not quite working because instead of providing the AC_Code for the last AC event prior to the reports if within 30 days of the report, it provides the AC_code for all the AC events within 30 days of the report.我尝试了以下代码,但效果不佳,因为它不是在报告后 30 天内为报告之前的最后一个AC 事件提供 AC_Code,而是在报告后 30 天内为所有AC 事件提供 AC_code。
Reports_following_AC_1 <- Reports_following_AC %>%
left_join(select(Reaction_per_park_per_day_3, c(Park, Date, AC_code))) %>%
filter(Date <= Month ) %>%
group_by(Park, Month, Obs_con, Coy_Season) %>%
mutate(Last_treatment = if_else((Month - max(Date))<28, AC_code, as.character(NA))) %>%
distinct
> head(Reports_following_AC_1)
Park Month Obs_con Coy_Season Number_AC Number_4w_AC Date AC_code Last_treatment
<chr> <date> <dbl> <dbl> <int> <int> <date> <chr> <chr>
1 14st NE - Coventry 2019-06-14 1 2 8 0 2019-01-30 3 NA
2 14st NE - Coventry 2019-06-14 1 2 8 0 2019-01-30 4 NA
3 14st NE - Coventry 2019-06-14 1 2 8 0 2019-01-30 1 NA
4 14st NE - Coventry 2019-06-14 1 2 8 0 2019-02-01 4 NA
5 14st NE - Coventry 2019-06-14 1 2 8 0 2019-02-01 2 NA
6 14st NE - Coventry 2019-06-14 1 2 8 0 2019-02-04 1 NA
I'm ideally looking for a dplyr solution, but I'm open to other possibilities.我理想地寻找 dplyr 解决方案,但我对其他可能性持开放态度。
you want to join with a selection of columns from Reaction_per_park_per_day_3 if i understand correctly?如果我理解正确的话,您想加入 Reaction_per_park_per_day_3 中的精选列吗? This should work:
这应该工作:
Reports_following_AC_1 <- Reports_following_AC %>%
left_join(select(Reaction_per_park_per_day_3, c(Park,Month,AC_cod), by="Park" ) %>%
filter(Date <= Month ) %>%
group_by(Park, Month, Obs_con, Coy_Season) %>%
mutate(Last_treatment = if_else((Month - max(Date))<28, lag(AC_code), as.character(NA))) %>%
distinct
I figured it out!我想到了!
Reports_following_AC_1 <- Reports_following_AC %>%
left_join(select(Reaction_per_park_per_day_3, c(Park, Date, AC_code))) %>%
filter(Date < Month ) %>%
group_by(Park, Month, Obs_con, Coy_Season, Number_4w_AC) %>%
mutate(Last_treatment = last(if_else((Month - max(Date))<28, AC_code, as.character(NA)))) %>%
select(c(Park, Month, Obs_con, Coy_Season, Number_4w_AC, Last_treatment)) %>%
distinct
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.