[英]How to extract values from a dataframe column based on the match of two ID columns of different dataframes?
[英]Match two columns from two dataframes and provide different column
我想根据“索引”列匹配两个具有不同尺寸(df1和df2)的数据框。 然后,根据匹配将两列从df2(shift和shiftdate)添加到df1。 但是我需要遵循多个规则。
df1 <- data.frame("Index" = c("Adams10-1", "Adams10-1", "Adams10-2", "Adams10-2", "Ball10-1", "Ball10-2", "Cash10-1", "Cash10-2", "David10-1", "David10-2"),
"CaseDate" = c("2005-10-01", "2005-10-01", "2005-10-02", "2005-10-02", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-02"),
"Type" = c("heart", "local", "knee", "nose", "heart", "foot", "shin", "foot", "spine", "delivery"),
"StartTime" = c(1640, 1755, 0112, 0300, 2145, 0233, 2123, 0326, 858, 1024))
df2 <- data.frame("Index" = c("Adams10-1", "Adams10-1", "Ball10-1", "Cash10-1", "David10-1", "David10-1", "David10-3"),
"ShiftDate" = c("2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-03"),
"Shift" = c("OB", "CV", "Night", "Super", "OB", "Day", "OB"),
"Multiple" = c("yes", "yes", "no", "no", "yes", "yes", "no"))
规则:
如果df1 $ Index和df2 $ Index AND之间存在匹配项:
如果df2 $ Multiple ==“ no”,则将df2 $ Shift和df2 $ ShiftDate添加到df1
如果df2 $ Multiple ==“ yes”,则给出NA(除非df1 $ Type ==“ heart”&df2 $ Shift ==“ CV”(在这种情况下,将CV移位和shiftdate从df2添加到df1))
如果df1 $ Index和df2 $ Index之间没有匹配项,则给出NA
除非df1 $ StartTime> 0000和<0700(在这种情况下,请从df1 $ CaseDate前一天的df2 $ shiftdate中添加df2 $ shift和df2 $ shiftdate)
除非df1 $ Type ==“ delivery”&df2 $ Shift =“ OB”(在这种情况下,请从df1 $ CaseDate之后一天的df2 $ shiftdate中添加df2 $ shift和df2 $ shiftdate)
我想得到以下结果。
df3 <- data.frame("Index" = c("Adams10-1", "Adams10-1", "Adams10-2", "Adams10-2", "Ball10-1", "Ball10-2", "Cash10-1", "Cash10-2", "David10-1", "David10-2"),
"CaseDate" = c("2005-10-01", "2005-10-01", "2005-10-02", "2005-10-02", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-02", "2005-10-01", "2005-10-02"),
"Type" = c("heart", "local", "knee", "nose", "heart", "foot", "shin", "foot", "spine", "delivery"),
"StartTime" = c(1640, 1755, 0112, 0300, 2145, 0233, 2123, 0326, 858, 1024),
"Shift" = c("CV", NA, NA, NA, "Night", "Night", "Super", "Super", NA, "OB"),
"ShiftDate" = c("2005-10-01", NA, NA, NA, "2005-10-01", "2005-10-01", "2005-10-01", "2005-10-01", NA, "2005-10-03"))
即使我不能基于所有这些规则来执行此操作,仅在匹配方面获得帮助也会有所帮助。 先感谢您!
我将基于df1
和df2
的结构以及问题中提供的目标数据集的内容在此做出一些假设。
Index
只是人员标识符(在此为姓名)与班次或病例日期的组合。 所以我们真的很想参加人和约会。 df1
Multiple
仅表示该人一天是否有多次轮班。 (我假设第一个David10-1
记录的“ no
”是一个错字。)因此,规则1的确是关于该人一天是否有多次轮班。 如果这两个是正确的,我们可以执行以下操作。 这段代码在很多地方都是多余的。 它可能会收紧很多。 但是它非常明确地显示了规则的逻辑。
library(dplyr)
library(lubridate)
# First, let's do make two changes: (1) convert the dates to real dates, and
# (2) replace Index with Name.
df1 = df1 %>%
mutate(CaseDate = ymd(CaseDate),
Name = gsub("[^A-Za-z]", "", Index)) %>%
select(Name, CaseDate, Type, StartTime)
df2 = df2 %>%
mutate(ShiftDate = ymd(ShiftDate),
Name = gsub("[^A-Za-z]", "", Index)) %>%
select(Name, ShiftDate, Shift)
# Start with df1.
df3 = df1 %>%
# Bring in matching records in df2. Filter df2 to records that are either
# (1) the only record for that person, or (2) CV shifts.
left_join(df2 %>%
group_by(Name, ShiftDate) %>%
mutate(num.shifts = n()) %>%
filter(num.shifts == 1 | Shift == "CV"),
by = c("Name", "CaseDate" = "ShiftDate")) %>%
# We want to keep Shift and ShiftDate for records from df2 that are either
# (1) the only record for that person, or (2) CV shifts that join to a
# "heart" type in df1.
mutate(Shift = case_when(num.shifts == 1 ~ Shift,
Type == "heart" & Shift == "CV" ~ Shift,
T ~ NA_character_),
ShiftDate = case_when(num.shifts == 1 ~ CaseDate,
Type == "heart" & Shift == "CV" ~ CaseDate)) %>%
select(Name, CaseDate, Type, StartTime, Shift, ShiftDate) %>%
# Bring in records in df2 that match on person and whose shift date is the
# day before the case date.
left_join(df2 %>%
group_by(Name, ShiftDate) %>%
filter(n() == 1) %>%
mutate(ShiftDateOneDayLater = ShiftDate + 1),
by = c("Name", "CaseDate" = "ShiftDateOneDayLater")) %>%
# Keep Shift and ShiftDate only if StartTime is between 0000 and 0700.
mutate(Shift = case_when(!is.na(Shift.x) ~ Shift.x,
StartTime > 0 & StartTime < 700 ~ Shift.y),
ShiftDate = case_when(!is.na(ShiftDate.x) ~ ShiftDate.x,
StartTime > 0 & StartTime < 700 ~ ShiftDate.y)) %>%
select(Name, CaseDate, Type, StartTime, Shift, ShiftDate) %>%
# Bring in records in df2 that match on person and whose shift date is the
# day after the case date.
left_join(df2 %>%
group_by(Name, ShiftDate) %>%
filter(n() == 1) %>%
mutate(ShiftDateOneDayBefore = ShiftDate - 1),
by = c("Name", "CaseDate" = "ShiftDateOneDayBefore")) %>%
# Keep Shift and ShiftDate only if this is a "delivery" case and an "OB"
# shift.
mutate(Shift = case_when(!is.na(Shift.x) ~ Shift.x,
Type == "delivery" & Shift.y == "OB" ~ Shift.y),
ShiftDate = case_when(!is.na(Shift.x) ~ ShiftDate.x,
Type == "delivery" & Shift.y == "OB" ~ ShiftDate.y)) %>%
select(Name, CaseDate, Type, StartTime, Shift, ShiftDate)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.