[英]Is there method in R to replace values in one data frame with a related value from another data frame?
I need to replace the values in one data frame with values from another data frame.我需要用另一个数据框中的值替换一个数据框中的值。 I am trying to find the simplest way to do so but I may be over thinking it.
我正在尝试找到最简单的方法,但我可能想多了。
Here is a sample of the data from DF1:以下是来自 DF1 的数据示例:
season article 1st_booking 2nd_booking
SS20 EF0647 2019-06-25 2019-07-09
SS20 059611 2019-07-30 2019-08-13
SS20 EG3208 2019-10-29 <NA>
SS20 EF9348 2019-10-29 2019-11-12
SS20 EE4609 2019-08-27 2019-10-29
SS20 EF7610 2019-09-24 2019-10-29
SS20 EH1307 2019-09-24 2019-10-29
SS20 EH1308 2019-09-24 2019-10-29
SS20 EH1309 2019-09-24 2019-10-29
SS20 EH1310 2019-09-24 2019-10-29
And from DF2:从 DF2 开始:
season article order_cutoff booking_deadline
SS20 EF0647 2019-06-25 2019-06-07
SS20 EF0647 2019-07-09 2019-06-07
SS20 EF0647 2019-12-10 2019-11-08
SS20 059611 2019-07-30 2019-07-12
SS20 059611 2019-08-13 2019-07-12
SS20 059611 2019-10-08 2019-09-06
SS20 EG3208 2019-10-29 2019-10-11
SS20 EF9348 2019-10-29 2019-10-11
SS20 EF9348 2019-11-12 2019-10-11
SS20 EF9348 2019-11-26 2019-11-08
Note that 1st_booking & 2nd_booking from DF1 are called 'order_cutoff' in DF2.请注意,来自 DF1 的 1st_booking 和 2nd_booking 在 DF2 中称为“order_cutoff”。 What I would like to do is in DF1, replace the values in columns 1st_booking & 2nd_booking with the related booking_deadline from DF2.
我想做的是在 DF1 中,将 1st_booking 和 2nd_booking 列中的值替换为 DF2 中相关的 booking_deadline。 Tried to do a merge but I don't want to create a new column - just replace the values in DF1 with the values in DF2
尝试进行合并,但我不想创建新列 - 只需将 DF1 中的值替换为 DF2 中的值
I am not exactly sure about the expected output.我不太确定预期的 output。 If you want to match
1st_booking
and 2nd_booking
to order_cutoff
for each article
and season
, we can get the data in long format do a left_join
matching the corresponding columns and get the data in wide format again.如果你想为每篇
article
和season
匹配1st_booking
和2nd_booking
到order_cutoff
,我们可以得到长格式的数据做一个left_join
匹配相应的列,然后再次得到宽格式的数据。
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(cols = ends_with("booking")) %>%
left_join(df2, by = c('season' = 'season', 'article' = 'article',
'value' = 'order_cutoff')) %>%
select(-value) %>%
pivot_wider(names_from = name, values_from = booking_deadline)
# A tibble: 10 x 4
# season article `1st_booking` `2nd_booking`
# <fct> <chr> <fct> <fct>
# 1 SS20 EF0647 2019-06-07 2019-06-07
# 2 SS20 059611 2019-07-12 2019-07-12
# 3 SS20 EG3208 2019-10-11 NA
# 4 SS20 EF9348 2019-10-11 2019-10-11
# 5 SS20 EE4609 NA NA
# 6 SS20 EF7610 NA NA
# 7 SS20 EH1307 NA NA
# 8 SS20 EH1308 NA NA
# 9 SS20 EH1309 NA NA
#10 SS20 EH1310 NA NA
If you want to only combine by dates and not season
and article
you can use match
如果您只想按日期而不是
season
和article
组合,您可以使用match
transform(df1,
`1st_booking` = df2$booking_deadline[match(`1st_booking`, df2$order_cutoff)],
`2nd_booking` = df2$booking_deadline[match(`2nd_booking`, df2$order_cutoff)])
data数据
df1 <- structure(list(season = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = "SS20", class = "factor"), article = structure(c(3L,
1L, 6L, 5L, 2L, 4L, 7L, 8L, 9L, 10L), .Label = c("059611", "EE4609",
"EF0647", "EF7610", "EF9348", "EG3208", "EH1307", "EH1308", "EH1309",
"EH1310"), class = "factor"), `1st_booking` = structure(c(1L,
2L, 5L, 5L, 3L, 4L, 4L, 4L, 4L, 4L), .Label = c("2019-06-25",
"2019-07-30", "2019-08-27", "2019-09-24", "2019-10-29"), class = "factor"),
`2nd_booking` = structure(c(2L, 3L, 1L, 5L, 4L, 4L, 4L, 4L,
4L, 4L), .Label = c("<NA>", "2019-07-09", "2019-08-13", "2019-10-29",
"2019-11-12"), class = "factor")), class = "data.frame", row.names = c(NA, -10L))
df2 <- structure(list(season = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = "SS20", class = "factor"), article = structure(c(2L,
2L, 2L, 1L, 1L, 1L, 4L, 3L, 3L, 3L), .Label = c("059611", "EF0647",
"EF9348", "EG3208"), class = "factor"), order_cutoff = structure(c(1L,
2L, 9L, 3L, 4L, 5L, 6L, 6L, 7L, 8L), .Label = c("2019-06-25",
"2019-07-09", "2019-07-30", "2019-08-13", "2019-10-08", "2019-10-29",
"2019-11-12", "2019-11-26", "2019-12-10"), class = "factor"),
booking_deadline = structure(c(1L, 1L, 5L, 2L, 2L, 3L, 4L,
4L, 4L, 5L), .Label = c("2019-06-07", "2019-07-12", "2019-09-06",
"2019-10-11", "2019-11-08"), class = "factor")), class = "data.frame",
row.names = c(NA, -10L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.