簡體   English   中英

使用dplyr :: left_join和多個條件合並兩個數據幀

[英]Merge two dataframes with dplyr::left_join and multiple conditions

我需要根據多個條件匹配每個案例從df1到df2的移位來創建df3。

library(lubridate)

df1 <- data.frame("Name" = c("Adams", "Adams", "Adams", "Adams", "Ball", "Ball", "Cash", "Cash", "David", "David"),
                  "Date.of.Service" = ymd(c("2005-10-01", "2005-10-01", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-02")),
                  "StartTime" = c(845, 955, 2333, 0300, 1045, 1322, 1145, 344, 858, 123),
                  "Code" = c("101", "500", "103", "104", "501", "103", "102", "106", "102", "109"))
df2 <- data.frame("Name" = c("Adams", "Adams", "Ball", "Cash", "Cash", "David", "David"),
                  "Date.of.Shift" = ymd(c("2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01")),
                  "Shift" = c("CVCALL", "ORD", "OB", "ORD2", "OB", "SUP", "OB"),
                  "Day.Night.Shift" = c("Full24", "Full24", "Day", "Day", "Night", "Day", "Full24"))

條件:

  1. 如果一個人一天有1班,那么與班次相匹配的案件應該轉到那個班次

  2. 如果df1 $代碼是“心臟代碼”並且該人具有“CVCALL”移位,則提供該移位

  3. 如果一個人一天有2個班次,那么那天的案件應該根據StartTime分配給班次(白班在629到1629之間發生,夜班在2059年到2359年之間發生)

  4. 如果一個案例StartTime在第二天介於000和700之間,並且一個人在前一天是“夜間”班次或“FULL24”班次,那么它應該轉到那個班次(如果他們在夜晚和Full24上,給NA)

我試過下面的代碼。 第一個left_join和mutate有效,但是當我到達第二個left_join和mutate時出現錯誤。 Error in mutate_impl(.data, dots) : Evaluation error: object 'Day.Night.Shift' not found.

library(dplyr)

Heart.Codes <- c("500", "501")

df3 = df1 %>%
  # Bring in matching records in availability points.  Filter df2 to records that are either
  # (1) the only record for that person, or (2) CV shifts.
  left_join(df2 %>%
              group_by(Name, Date.of.Shift) %>%
              mutate(num.shifts = n()) %>%
              filter(num.shifts == 1 | Shift %in% c("CVCALL")),
            by = c("Name", "Date.of.Service" = "Date.of.Shift")) %>%
  # We want to keep Shift and ShiftDate for records from availability that are either
  # (1) the only record for that person, or (2) CV shifts that join to a
  # "heart" type in df1.
  mutate(Shift = case_when(num.shifts == 1 ~ Shift,
                           Code %in% Heart.Codes & Shift == "CVCALL" ~ Shift,
                           T ~ NA_integer_),
         Date.of.Shift = case_when(num.shifts == 1 ~ Date.of.Service, 
                                   Code %in% Heart.Codes & Shift == "CVCALL" ~ Date.of.Service),
         Day.Night.Shift = case_when(num.shifts == 1 ~ Day.Night.Shift, 
                                     Code %in% Heart.Codes & Shift == "CVCALL" ~ Day.Night.Shift)) %>%
  select(Name, Date.of.Service, StartTime, Code, Date.of.Shift, Shift, Day.Night.Shift) %>% 
  # assign correct shift when there are two shifts. Filter df2 to records that have two shifts in a day.
  left_join(df2 %>%
              group_by(Name, Date.of.Shift) %>%
              mutate(num.shifts = n()) %>% 
              filter(num.shifts == 2),
            by = c("Name", "Date.of.Service" = "Date.of.Shift")) %>%
  mutate(Shift = case_when(num.shifts == 2 & StartTime > 629 & StartTime < 1629 & Day.Night.Shift == "Day" ~ Shift,
                           num.shifts == 2 & StartTime > 2059 & StartTime < 2359 & Day.Night.Shift == "Night" ~ Shift,
                           T ~ NA_integer_),
         Date.of.Shift = case_when(num.shifts == 2 & StartTime > 629 & StartTime < 1629 & Day.Night.Shift == "Day" ~ Date.of.Shift,
                                   num.shifts == 2 & StartTime > 2059 & StartTime < 2359 & Day.Night.Shift == "Night" ~ Date.of.Shift)) %>%
  select(Name, Date.of.Service, StartTime, Code, Date.of.Shift, Shift, Day.Night.Shift) %>% 
  # Bring in records whose shift date is the day before the case date.
  left_join(df2 %>%
            group_by(Name, Date.of.Shift) %>%
            mutate(ShiftDateOneDayLater = Date.of.Shift + 1),
          by = c("Name", "Date.of.Service" = "ShiftDateOneDayLater")) %>%
  # Keep Shift and Date of Shift only if StartTime is between 0000 and 0659.
  mutate(Shift = case_when(!is.na(Shift.x) ~ Shift.x,
                         Start.Time > 0 & Start.Time < 659 ~ Shift.y),
       Date.of.Shift = case_when(!is.na(Date.of.Shift.x) ~ Date.of.Shift.x,
                                 Start.Time > 0 & Start.Time < 659 ~ Date.of.Shift.y)) %>%
  select(Name, Date.of.Service, StartTime, Code, Date.of.Shift, Shift, Day.Night.Shift)

基於這些條件,代碼將生成這個新的df3數據幀。

df3 <- data.frame("Name" = c("Adams", "Adams", "Adams", "Adams", "Ball", "Ball", "Cash", "Cash", "David", "David"),
                  "Date.of.Service" = ymd(c("2005-10-01", "2005-10-01", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-02")),
                  "StartTime" = c(845, 955, 2333, 0300, 1045, 1322, 1145, 344, 858, 123),
                  "Code" = c("101", "500", "103", "104", "501", "103", "102", "106", "102", "109"),
                  "Date.of.Shift" = ymd(c("2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", NA, "2005-10-01")),
                  "Shift" = c("ORD", "CVCALL", "ORD", "ORD", "OB", "OB", "ORD2", "OB", NA, "OB"),
                  "Day.Night.Shift" = c("Full24", "Full24", "Full24", "Full24", "Day", "Day", "Day", "Night", NA, "Full24"))

它給出了此錯誤消息,因為在第二個連接中,左表和右表都有一個名為Day.Night.Shift的列。 當表有一個具有相同名稱的列(並且該列不是連接的一部分)時, dplyr自動將它們重命名為Day.Night.Shift.xDay.Night.Shift.y 我發現將所有內容運行到聯接是有幫助的,以便查看正在發生的事情:

df3 = df1 %>%
  # Bring in matching records in availability points.  Filter df2 to records that are either
  # (1) the only record for that person, or (2) CV shifts.
  left_join(df2 %>%
              group_by(Name, Date.of.Shift) %>%
              mutate(num.shifts = n()) %>%
              filter(num.shifts == 1 | Shift %in% c("CVCALL")),
            by = c("Name", "Date.of.Service" = "Date.of.Shift")) %>%
  # We want to keep Shift and ShiftDate for records from availability that are either
  # (1) the only record for that person, or (2) CV shifts that join to a
  # "heart" type in df1.
  mutate(Shift = case_when(num.shifts == 1 ~ Shift,
                           Code %in% Heart.Codes & Shift == "CVCALL" ~ Shift,
                           T ~ NA_integer_),
         Date.of.Shift = case_when(num.shifts == 1 ~ Date.of.Service, 
                                   Code %in% Heart.Codes & Shift == "CVCALL" ~ Date.of.Service),
         Day.Night.Shift = case_when(num.shifts == 1 ~ Day.Night.Shift, 
                                     Code %in% Heart.Codes & Shift == "CVCALL" ~ Day.Night.Shift)) %>%
  select(Name, Date.of.Service, StartTime, Code, Date.of.Shift, Shift, Day.Night.Shift) %>% 
  # assign correct shift when there are two shifts. Filter df2 to records that have two shifts in a day.
  left_join(df2 %>%
              group_by(Name, Date.of.Shift) %>%
              mutate(num.shifts = n()) %>% 
              filter(num.shifts == 2),
            by = c("Name", "Date.of.Service" = "Date.of.Shift"))

您可以在mutate (以及下面的select )中根據需要引用Day.Night.Shift.xDay.Night.Shift.y來消除錯誤。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM