简体   繁体   English

在两列上合并DataFrame

[英]Merge DataFrames on two columns

This is a follow-up from this question 这是这个问题的后续行动

I have two pandas DataFrames, as follows: 我有两个pandas DataFrame,如下:

print( a )

    foo   bar   let letval
9  foo1  bar1  let1      a
8  foo2  bar2  let1      b
7  foo3  bar3  let1      c
6  foo1  bar1  let2      z
5  foo2  bar2  let2      y
4  foo3  bar3  let2      x

print( b )

    foo   bar   num  numval
0  foo1  bar1  num1       1
1  foo2  bar2  num1       2
2  foo3  bar3  num1       3
3  foo1  bar1  num2       4
4  foo2  bar2  num2       5
5  foo3  bar3  num2       6

I want to merge the two of them on the columns [ 'foo', 'bar' ] . 我想在列[ 'foo', 'bar' ]merge它们中的两个。

If I simply do c = pd.merge( a, b, on=['foo', 'bar'] ) , I get: 如果我只是做c = pd.merge( a, b, on=['foo', 'bar'] ) ,我得到:

prnint( c )

     foo   bar   let letval   num  numval
0   foo1  bar1  let1      a  num1       1
1   foo1  bar1  let1      a  num2       4
2   foo1  bar1  let2      z  num1       1
3   foo1  bar1  let2      z  num2       4
4   foo2  bar2  let1      b  num1       2
5   foo2  bar2  let1      b  num2       5
6   foo2  bar2  let2      y  num1       2
7   foo2  bar2  let2      y  num2       5
8   foo3  bar3  let1      c  num1       3
9   foo3  bar3  let1      c  num2       6
10  foo3  bar3  let2      x  num1       3
11  foo3  bar3  let2      x  num2       6

I would like: 我想要:

print( c )

    foo   bar   let letval   num   numval
0  foo1  bar1  let1      a   num1       1
1  foo2  bar2  let1      b   num1       2
2  foo3  bar3  let1      c   num1       3
3  foo1  bar1  let2      z   num2       4
4  foo2  bar2  let2      y   num2       5
5  foo3  bar3  let2      x   num2       6

The closest I've got is: 我最接近的是:

c = pd.merge( a, b, left_index=['foo', 'bar'], right_index=['foo', 'bar'] )

What am I missing? 我错过了什么?

And why do I get c.shape = (12,6) in the first example? 为什么我在第一个例子中得到c.shape = (12,6)


Edit 编辑

Thanks to @piRSquared's answer I realized that the underlying problem is that there is not a single combination of columns to do that. 感谢@ piRSquared的回答,我意识到潜在的问题是没有一个列的组合来做到这一点。 Thus the merge problem, as posed before cannot be univocally solved. 因此,之前提出的合并问题不能单一解决。 That said, the question is converted into a simpler one: 也就是说,问题转化为更简单的问题:

How to make a univocal relationship between the tables? 如何在表之间建立单一的关系?

I solved that with a dictionary that maps the desired outputs that need to be aligned: 我用一本字典来解决这个问题,该字典映射了需要对齐的所需输出:

map_ab = { 'num1':'let1', 'num2':'let2' }
b['let'] = b.apply( lambda x: map_ab[x['num']], axis=1 )
c = pd.merge( a, b, on=['foo', 'bar', 'let'] )
print( c )

The reason you are getting that is because the columns you are merging on do not constitute unique combinations. 您得到的原因是因为您合并的列不构成唯一组合。 For example, The first (index 0) row of a has foo1 and bar1 , but so does the fourth row (index 3). 例如,所述的第一(索引0)的行a具有foo1bar1 ,但这样做的第四行(索引3)。 Ok, that's fine, but b has the same issue. 好的,没关系,但是b有同样的问题。 So, when you match up b 's foo1 & bar1 for row indexed with 0 it matches twice. 因此,当你将bfoo1bar1与用0索引的行匹配时,它匹配两次。 Same is true when you match foo1 & bar1 in row indexed with 3 , it matches twice. 当您将索引为3行中的foo1bar1匹配时,情况也是如此,它匹配两次。 So you end up with four matches for those 2 rows. 所以你最终得到了这两行的四场比赛。

So you get 所以你得到了

  • a row 0 matches with b row 0 a行0用火柴b行0
  • a row 0 matches with b row 3 a行0用火柴b行3
  • a row 3 matches with b row 0 a行3根用火柴b行0
  • a row 3 matches with b row 3 a行3根用火柴b行3

And THEN, your example does this 2 more times. 然后,你的例子再做2次。 3 * 4 == 12

The only way to do this and be unambiguous is to decide on a rule on which match to take if there are more than one matches. 要做到这一点并且明确无误的唯一方法是决定在有多个匹配项时要采取哪种匹配的规则。 I decided to groupby one of your other columns then take the first one. 我决定将你的其他一个专栏分组,然后选择第一个专栏。 It still doesn't match your expected output but I'm proposing that you gave a bad example. 它仍然与你的预期输出不符,但我建议你给出一个坏的例子。

pd.merge( a, b, on=['foo', 'bar']).groupby(['foo', 'bar', 'let'], as_index=False).first()

在此输入图像描述

you can use combine_first : 你可以使用combine_first

In[21]:a.combine_first(b)
Out[21]: 
    bar   foo   let letval   num  numval
0  bar1  foo1  let1      a  num1       1
1  bar2  foo2  let1      b  num1       2
2  bar3  foo3  let1      c  num1       3
3  bar1  foo1  let2      z  num2       4
4  bar2  foo2  let2      y  num2       5
5  bar3  foo3  let2      x  num2       6

In the first example you are doing inner join which returns all rows if bar & foo are equal in a,b . 在第一个示例中,您正在执行inner join ,如果barfooa,b中相等,则返回所有行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM