![](/img/trans.png)
[英]Why do results differ for dplyr left_join() and right_join() using these two dataframes
[英]Merge two dataframes with dplyr::left_join and multiple conditions
我需要根據多個條件匹配每個案例從df1到df2的移位來創建df3。
library(lubridate)
df1 <- data.frame("Name" = c("Adams", "Adams", "Adams", "Adams", "Ball", "Ball", "Cash", "Cash", "David", "David"),
"Date.of.Service" = ymd(c("2005-10-01", "2005-10-01", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-02")),
"StartTime" = c(845, 955, 2333, 0300, 1045, 1322, 1145, 344, 858, 123),
"Code" = c("101", "500", "103", "104", "501", "103", "102", "106", "102", "109"))
df2 <- data.frame("Name" = c("Adams", "Adams", "Ball", "Cash", "Cash", "David", "David"),
"Date.of.Shift" = ymd(c("2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01")),
"Shift" = c("CVCALL", "ORD", "OB", "ORD2", "OB", "SUP", "OB"),
"Day.Night.Shift" = c("Full24", "Full24", "Day", "Day", "Night", "Day", "Full24"))
條件:
如果一個人一天有1班,那么與班次相匹配的案件應該轉到那個班次
如果df1 $代碼是“心臟代碼”並且該人具有“CVCALL”移位,則提供該移位
如果一個人一天有2個班次,那么那天的案件應該根據StartTime分配給班次(白班在629到1629之間發生,夜班在2059年到2359年之間發生)
如果一個案例StartTime在第二天介於000和700之間,並且一個人在前一天是“夜間”班次或“FULL24”班次,那么它應該轉到那個班次(如果他們在夜晚和Full24上,給NA)
我試過下面的代碼。 第一個left_join和mutate有效,但是當我到達第二個left_join和mutate時出現錯誤。 Error in mutate_impl(.data, dots) : Evaluation error: object 'Day.Night.Shift' not found.
library(dplyr)
Heart.Codes <- c("500", "501")
df3 = df1 %>%
# Bring in matching records in availability points. Filter df2 to records that are either
# (1) the only record for that person, or (2) CV shifts.
left_join(df2 %>%
group_by(Name, Date.of.Shift) %>%
mutate(num.shifts = n()) %>%
filter(num.shifts == 1 | Shift %in% c("CVCALL")),
by = c("Name", "Date.of.Service" = "Date.of.Shift")) %>%
# We want to keep Shift and ShiftDate for records from availability that are either
# (1) the only record for that person, or (2) CV shifts that join to a
# "heart" type in df1.
mutate(Shift = case_when(num.shifts == 1 ~ Shift,
Code %in% Heart.Codes & Shift == "CVCALL" ~ Shift,
T ~ NA_integer_),
Date.of.Shift = case_when(num.shifts == 1 ~ Date.of.Service,
Code %in% Heart.Codes & Shift == "CVCALL" ~ Date.of.Service),
Day.Night.Shift = case_when(num.shifts == 1 ~ Day.Night.Shift,
Code %in% Heart.Codes & Shift == "CVCALL" ~ Day.Night.Shift)) %>%
select(Name, Date.of.Service, StartTime, Code, Date.of.Shift, Shift, Day.Night.Shift) %>%
# assign correct shift when there are two shifts. Filter df2 to records that have two shifts in a day.
left_join(df2 %>%
group_by(Name, Date.of.Shift) %>%
mutate(num.shifts = n()) %>%
filter(num.shifts == 2),
by = c("Name", "Date.of.Service" = "Date.of.Shift")) %>%
mutate(Shift = case_when(num.shifts == 2 & StartTime > 629 & StartTime < 1629 & Day.Night.Shift == "Day" ~ Shift,
num.shifts == 2 & StartTime > 2059 & StartTime < 2359 & Day.Night.Shift == "Night" ~ Shift,
T ~ NA_integer_),
Date.of.Shift = case_when(num.shifts == 2 & StartTime > 629 & StartTime < 1629 & Day.Night.Shift == "Day" ~ Date.of.Shift,
num.shifts == 2 & StartTime > 2059 & StartTime < 2359 & Day.Night.Shift == "Night" ~ Date.of.Shift)) %>%
select(Name, Date.of.Service, StartTime, Code, Date.of.Shift, Shift, Day.Night.Shift) %>%
# Bring in records whose shift date is the day before the case date.
left_join(df2 %>%
group_by(Name, Date.of.Shift) %>%
mutate(ShiftDateOneDayLater = Date.of.Shift + 1),
by = c("Name", "Date.of.Service" = "ShiftDateOneDayLater")) %>%
# Keep Shift and Date of Shift only if StartTime is between 0000 and 0659.
mutate(Shift = case_when(!is.na(Shift.x) ~ Shift.x,
Start.Time > 0 & Start.Time < 659 ~ Shift.y),
Date.of.Shift = case_when(!is.na(Date.of.Shift.x) ~ Date.of.Shift.x,
Start.Time > 0 & Start.Time < 659 ~ Date.of.Shift.y)) %>%
select(Name, Date.of.Service, StartTime, Code, Date.of.Shift, Shift, Day.Night.Shift)
基於這些條件,代碼將生成這個新的df3數據幀。
df3 <- data.frame("Name" = c("Adams", "Adams", "Adams", "Adams", "Ball", "Ball", "Cash", "Cash", "David", "David"),
"Date.of.Service" = ymd(c("2005-10-01", "2005-10-01", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-02")),
"StartTime" = c(845, 955, 2333, 0300, 1045, 1322, 1145, 344, 858, 123),
"Code" = c("101", "500", "103", "104", "501", "103", "102", "106", "102", "109"),
"Date.of.Shift" = ymd(c("2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", NA, "2005-10-01")),
"Shift" = c("ORD", "CVCALL", "ORD", "ORD", "OB", "OB", "ORD2", "OB", NA, "OB"),
"Day.Night.Shift" = c("Full24", "Full24", "Full24", "Full24", "Day", "Day", "Day", "Night", NA, "Full24"))
它給出了此錯誤消息,因為在第二個連接中,左表和右表都有一個名為Day.Night.Shift
的列。 當表有一個具有相同名稱的列(並且該列不是連接的一部分)時, dplyr
自動將它們重命名為Day.Night.Shift.x
和Day.Night.Shift.y
。 我發現將所有內容運行到聯接是有幫助的,以便查看正在發生的事情:
df3 = df1 %>%
# Bring in matching records in availability points. Filter df2 to records that are either
# (1) the only record for that person, or (2) CV shifts.
left_join(df2 %>%
group_by(Name, Date.of.Shift) %>%
mutate(num.shifts = n()) %>%
filter(num.shifts == 1 | Shift %in% c("CVCALL")),
by = c("Name", "Date.of.Service" = "Date.of.Shift")) %>%
# We want to keep Shift and ShiftDate for records from availability that are either
# (1) the only record for that person, or (2) CV shifts that join to a
# "heart" type in df1.
mutate(Shift = case_when(num.shifts == 1 ~ Shift,
Code %in% Heart.Codes & Shift == "CVCALL" ~ Shift,
T ~ NA_integer_),
Date.of.Shift = case_when(num.shifts == 1 ~ Date.of.Service,
Code %in% Heart.Codes & Shift == "CVCALL" ~ Date.of.Service),
Day.Night.Shift = case_when(num.shifts == 1 ~ Day.Night.Shift,
Code %in% Heart.Codes & Shift == "CVCALL" ~ Day.Night.Shift)) %>%
select(Name, Date.of.Service, StartTime, Code, Date.of.Shift, Shift, Day.Night.Shift) %>%
# assign correct shift when there are two shifts. Filter df2 to records that have two shifts in a day.
left_join(df2 %>%
group_by(Name, Date.of.Shift) %>%
mutate(num.shifts = n()) %>%
filter(num.shifts == 2),
by = c("Name", "Date.of.Service" = "Date.of.Shift"))
您可以在mutate
(以及下面的select
)中根據需要引用Day.Night.Shift.x
或Day.Night.Shift.y
來消除錯誤。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.