简体   繁体   English

Python Pandas基于多个值字段合并两个数据框

[英]Python Pandas merge two data frames based on multiple values field

I have two Pandas' data frames 我有两个熊猫的数据框

df1
       items        view
0      A|B|C  02-10-2015
1        D|E  02-15-2015

df2
  item  num  val
0    A    1   10
1    B    3    2
2    C    8    9
3    D    9   13
4    E    2   22

I want to mere these frames to get 我只希望这些框架

df 
  view       num1 val1 num2 val2 num3 val3
0 02-10-2015 1    10   3    2    8    9
1 02-15-2015 9    13   2    22   na   na

My current approach is to split df1.items using 我目前的方法是使用以下方法拆分df1.items

df3 = pd.DataFrame(df1['items'].str.split('|').tolist())

which results in 导致

    0    1    2
0   A    B    C
1   D    E None

and finally merges each individual column and concatenates them with the original df1 最后合并每个单独的列,并将它们与原始df1连接

x = pd.merge(df3[[0]], df2, how='left', on='item')
y = pd.merge(df3[[1]], df2, how='left', on='item')
z = pd.merge(df3[[2]], df2, how='left', on='item')
pd.concat([df1, x.ix[:,1:],y.ix[:,1:],z.ix[:,1:]], axis=1)

The code works but it doesn't seem right to me, I would be happy if anyone can point out a proper way to achieve the same results. 该代码有效,但对我而言似乎不合适,如果有人可以指出实现相同结果的正确方法,我将非常高兴。

Thank you in advance! 先感谢您!

Note: str.split has a return_type argument: 注意: str.split具有return_type参数:

In [11]: res = df1['items'].str.split("|", return_type='frame')

In [12]: res
Out[12]:
   0  1    2
0  A  B    C
1  D  E  NaN

In [13]: res.index = df1['view']

In [14]: res
Out[14]:
            0  1    2
view
02-10-2015  A  B    C
02-15-2015  D  E  NaN

I think a nicer, more general way, to do this is using stack/unstack: 我认为一种更好,更通用的方法是使用堆栈/堆栈:

In [15]: res = res.stack()

In [16]: res
Out[16]:
view
02-10-2015  0    A
            1    B
            2    C
02-15-2015  0    D
            1    E
dtype: object

Now you can either merge, or if you're lucky just switch out the index: 现在,您可以合并,或者如果幸运的话,只需切换索引即可:

In [17]: df2 = df2.set_index('item') # could just drop this column

In [18]: df2.loc[res]  # reorder, may not be required
Out[18]:
      num  val
item
A       1   10
B       3    2
C       8    9
D       9   13
E       2   22

Now the magic: 现在魔术:

In [21]: df2.index = r.index

In [22]: df2
Out[22]:
              num  val
view
02-10-2015 0    1   10
           1    3    2
           2    8    9
02-15-2015 0    9   13
           1    2   22

In [23]: df2.unstack()
Out[23]:
           num        val
             0  1   2   0   1   2
view
02-10-2015   1  3   8  10   2   9
02-15-2015   9  2 NaN  13  22 NaN

As desired (with a MultiIndex columns, which is what you want). 根据需要(使用所需的MultiIndex列)。


Note: If you have duplicates (A, B, Cs) you're going to need to merge (which is a little fiddlier, but could be cleaned up). 注意:如果有重复项(A,B,C),则需要合并(这有点麻烦,但可以清除)。 Before [21]: 在[21]之前:

In [31]: df2.merge(res.to_frame(), left_index=True, right_on=0).unstack()
Out[31]:
           num        val          0
             0  1   2   0   1   2  0  1    2
view
02-10-2015   1  3   8  10   2   9  A  B    C
02-15-2015   9  2 NaN  13  22 NaN  D  E  NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM