R：如何通过部分匹配列来合并两个数据集？

Question

I have two different datasets which look like this:我有两个不同的数据集，如下所示：

city1 <- c("LONDON","PARIS","ROME","MADRID","LISBON","AMSTERDAM")
f1.1 <- c(11,4,5,3,34,24)
f2.1 <- c(104,153,346,17478,44,290)
f3.1 <- c(0,153,7|8|15|10|3|9|13|14|97|707,17478,14|13|12|11|10|9|8|7|6|5|4,290)
f4 <- c("AA","BB","DD","AA","CC","NN")

city2 <- c("MANCHESTER","PARIS","ROME","BARCELONA","LISBON","AMSTERDAM")
f1.2 <- c(11,4,5,8,34,20)
f2.2 <- c(100,153,346,500,44,290)
f3.2 <- c(4,153,15,10200,7,180)

df1
       city   f1    f2                         f3  f4
1    LONDON   11   104                          0  AA
2     PARIS    4   153                        153  BB
3      ROME    5   346 7|8|15|10|3|9|13|14|97|707  DD
4    MADRID    3 17478                      17478  AA
5    LISBON   34    44 14|13|12|11|10|9|8|7|6|5|4  CC
6 AMSTERDAM   24   290                        290  NN

df2
       city2   f1   f2    f3
1 MANCHESTER   11  100     4
2      PARIS    4  153   153
3       ROME    5  346    15
4  BARCELONA    8  500 10200
5     LISBON   34   44     7
6  AMSTERDAM   20  290   180

My goal is to obtain a dataset df3 that contains the matching data between those two.我的目标是获得一个数据集 df3 ，其中包含这两者之间的匹配数据。 Data ending up in df3 need to match along the following features: 'city', 'f1', 'f2' and 'f3'.以 df3 结尾的数据需要匹配以下特征：“city”、“f1”、“f2”和“f3”。 I managed to do so by merge(df1,df2,by=c('city','f1', 'f2','f3')) and in this case I obtain我设法通过merge(df1,df2,by=c('city','f1', 'f2','f3'))做到这一点，在这种情况下我得到

  city1   f1   f2   f3  f4
1 PARIS    4  153  153  BB

However, it does not capture those cases where I have a bunch of numbers in column 'f3' of df1.但是，它没有捕捉到我在 df1 的“f3”列中有一堆数字的情况。 Hence, for column 'f3' I would like to carry out a sort of partial matching and obtain the following:因此，对于“f3”列，我想进行一种部分匹配并获得以下信息：

   city   f1   f2   f3  f4
1 PARIS    4  153  153  BB
2  ROME    5  346   15  DD
3 LISBON  34   44    7  CC

Note that the original datasets contain +1M (million) and 300 rows respectively.请注意，原始数据集分别包含 +1M（百万）和 300 行。

Answer 1

here is an approach where you first split the df1$f3 column to multiple rows (| = separator), and then perform a left join.这是一种方法，您首先将 df1$f3 列拆分为多行（| = 分隔符），然后执行左连接。

library(splitstackshape)
library(data.table)
# Set to data.table format
setDT(df1); setDT(df2)
# Split column f3 to multiple rows, use | as separator
df1.long <- splitstackshape::cSplit(df1, "f3", sep = "|", direction = "long")
# left join, only keep matched rows
df2[ df1.long, on = .(city2 = city, f1, f2, f3), nomatch = 0L]
#     city2 f1  f2  f3 f4
# 1:  PARIS  4 153 153 BB
# 2:   ROME  5 346  15 DD
# 3: LISBON 34  44   7 CC

sample data used使用的样本数据

df1 <- read.table(text="      city   f1    f2                         f3  f4
    LONDON   11   104                          0  AA
     PARIS    4   153                        153  BB
      ROME    5   346 7|8|15|10|3|9|13|14|97|707  DD
    MADRID    3 17478                      17478  AA
    LISBON   34    44 14|13|12|11|10|9|8|7|6|5|4  CC
 AMSTERDAM   24   290                        290  NN", header = TRUE)

df2 <- read.table(text="  city2   f1   f2    f3
 MANCHESTER   11  100     4
      PARIS    4  153   153
       ROME    5  346    15
  BARCELONA    8  500 10200
     LISBON   34   44     7
  AMSTERDAM   20  290   180 ", header = TRUE)

R：如何通过部分匹配列来合并两个数据集？

问题描述

1 个解决方案

解决方案1
0 2021-04-22 15:50:50

R：如何通过部分匹配列来合并两个数据集？

问题描述

1 个解决方案

解决方案1 0 2021-04-22 15:50:50

解决方案1
0 2021-04-22 15:50:50