[英]Create a column in one dataframe based on another column in another dataframe in R
I am fairly new to R and DPLYR and I am stuck on a this issue:我对 R 和 DPLYR 相当陌生,我被困在这个问题上:
I have two tables:我有两张桌子:
(1) Repairs done on cars (一)汽车维修
(2) Amount owed on each car over time (2) 每辆车的欠款随着时间的推移
What I would like to do is create three extra columns on the repair table that gives me: (1) the amount owed on the car when the repair was done, (2) 3months down the road and (3) finally last payment record on file.我想做的是在维修表上创建三个额外的列,这给了我:(1)维修完成时欠汽车的金额,(2)3个月的路和(3)最后的付款记录文件。
And if the case where the repair date does not match with any payment record, I need to use the closest amount owed on record.如果维修日期与任何付款记录不匹配,我需要使用记录中最接近的欠款金额。
So something like:所以像:
Any ideas how I can do that?任何想法我该怎么做?
Here are the data frames:以下是数据框:
Repairs done on cars:汽车维修:
df_repair <- data.frame(unique_id =
c("A1","A2","A3","A4","A5","A6","A7","A8"),
car_number = c(1,1,1,2,2,2,3,3),
repair_done = c("Front Fender","Front
Lights","Rear Lights","Front Fender", "Rear Fender","Rear Lights","Front
Lights","Front Fender"),
YearMonth = c("2014-03","2016-03","2016-07","2015-05","2015-08","2016-01","2018-01","2018-05"))
df_owed <- data.frame(car_number = c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3),
YearMonth = c("2014-02","2014-05","2014-06","2014-08","2015-06","2015-12","2016-03","2016-04","2016-05","2016-06","2016-07","2016-08","2015-05","2015-08","2015-12","2016-03","2018-01","2018-02","2018-03","2018-04","2018-05","2018-09"),
amount_owed = c(20000,18000,17500,16000,10000,7000,6000,5500,5000,4500,4000,3000,10000,8000,6000,0,50000,40000,35000,30000,25000,15000))
Using zoo
for year-months, and tidyverse
, you could try the following.使用zoo
for year-month 和tidyverse
,您可以尝试以下操作。 Using left_join
add all the df_owed
data to your df_repair
data, by the car_number
.使用left_join
将所有df_owed
数据添加到您的df_repair
数据中,按car_number
。 You can convert your year-month columns to yearmon
objects with zoo
.您可以使用zoo
将年月列转换为yearmon
对象。 Then, sort your rows by the year-month column from df_owed
.然后,按df_owed
中的年月列对行进行排序。
For each unique_id
(using group_by
) you can create your three columns of interest.对于每个unique_id
(使用group_by
),您可以创建您感兴趣的三列。 The first will use the latest amount_owed
where the owed date is prior to the service date.第一个将使用最新的amount_owed
,其中欠款日期早于服务日期。 Then second (3 months) will use the first amount_owed
value where the owed date follows the service date by 3 months (3/12).然后第二个(3 个月)将使用第一个amount_owed
值,其中欠款日期比服务日期晚 3 个月(3/12)。 Finally, the most recent take just the last
value from amount_owed
.最后,最近的只是从amount_owed
中获取的last
值。
Using the example data, the results differ a bit, possibly due to the data frames not matching the images in the post.使用示例数据,结果略有不同,可能是由于数据帧与帖子中的图像不匹配。
library(tidyverse)
library(zoo)
df_repair %>%
left_join(df_owed, by = "car_number") %>%
mutate_at(c("YearMonth.x", "YearMonth.y"), as.yearmon) %>%
arrange(YearMonth.y) %>%
group_by(unique_id, car_number) %>%
summarise(
owed_repair_done = last(amount_owed[YearMonth.y <= YearMonth.x]),
owed_3_months = first(amount_owed[YearMonth.y >= YearMonth.x + 3/12]),
owed_most_recent = last(amount_owed)
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.