R：基于substring合并两个数据帧

Question

I have two dataframes.我有两个数据框。 The df1 one looks like: df1看起来像：

           Day     Element    Incident
1   2020-04-06     3101       Check incident by SOILING
2   2020-04-02     3102       Check alarm 5662
3   2020-05-21     3101       Check energy loss by METEO ERROR
4   2020-04-02     3202       Check ACDC grid

The other one, df2 , looks like this:另一个df2如下所示：

         Day     Element  Incident       Energy_loss
1 2020-04-06     3101     SOILING        0.05
2 2020-04-14     3101     SOILING        0.01
3 2020-05-21     3101     METEO ERROR    0.11
4 2020-06-15     3102     METEO ERROR    0.03

I would like to merge them based on the columns Day , Element and Incident , so I need to find when the column Incident in df1 contains the column Incident of df2 .我想根据Day 、 Element和Incident列合并它们，所以我需要找到df1中的列Incident何时包含df2的列Incident 。 The rows where df1 doesn't have a match with df2 can be left with a Nan in the Energy loss column. df1与df2不匹配的行可以在Energy loss列中留下一个Nan 。

I've tried with the usual merge, but as one of the conditions of the merge is by a substring, it's not working properly.我已经尝试过通常的合并，但由于merge的条件之一是 substring，它无法正常工作。

The output I expect is:我期望的 output 是：

           Day     Element    Incident                          Energy loss
1   2020-04-06     3101       Check incident by SOILING                0.05
2   2020-04-02     3102       Check alarm 5662                          Nan
3   2020-05-21     3101       Check energy loss by METEO ERROR         0.11
4   2020-04-02     3202       Check ACDC grid                           Nan

Answer 1

We could use regex_left_join我们可以使用regex_left_join

library(dplyr)
library(fuzzyjoin)
regex_left_join(df1, df2, by = c('Day', 'Element', 'Incident')) %>% 
    select(Day = Day.x, Element = Element.x, Incident = Incident.x, Energy_loss)

-output -输出

#       Day Element                         Incident Energy_loss
#1 2020-04-06    3101        Check incident by SOILING        0.05
#2 2020-04-02    3102                 Check alarm 5662          NA
#3 2020-05-21    3101 Check energy loss by METEO ERROR        0.11
#4 2020-04-02    3202                  Check ACDC grid          NA

data数据

df1 <- structure(list(Day = c("2020-04-06", "2020-04-02", "2020-05-21", 
"2020-04-02"), Element = c(3101L, 3102L, 3101L, 3202L), 
Incident = c("Check incident by SOILING", 
"Check alarm 5662", "Check energy loss by METEO ERROR", "Check ACDC grid"
)), class = "data.frame", row.names = c("1", "2", "3", "4"))

df2 <- structure(list(Day = c("2020-04-06", "2020-04-14", "2020-05-21", 
"2020-06-15"), Element = c(3101L, 3101L, 3101L, 3102L), Incident = c("SOILING", 
"SOILING", "METEO ERROR", "METEO ERROR"), Energy_loss = c(0.05, 
0.01, 0.11, 0.03)), class = "data.frame", row.names = c("1", 
"2", "3", "4"))

R：基于substring合并两个数据帧

问题描述

1 个解决方案

解决方案1
4 已采纳 2021-02-01 11:47:31

data数据

R：基于substring合并两个数据帧

问题描述

1 个解决方案

解决方案1 4 已采纳 2021-02-01 11:47:31

data数据

解决方案1
4 已采纳 2021-02-01 11:47:31