简体   繁体   English

熊猫:如何合并不同的数据帧?

[英]Pandas: how to merge different dataframes?

I have two dataframes df1 and df2 . 我有两个数据帧df1df2

The first dataframe contain the name of people: 第一个数据框包含人名:

df1  NAME
0    Paul
1    Jack
2    Anna
3    Tom
4    Eva

and a second name with the information of the amount of money received and payed by each person. 第二个名称,包含每个人收到和支付的金额信息。 There are some person that are not in df1 , for instance Zack . 有一些人不在df1 ,例如Zack Some people could not appear in the list, for instance Tom 有些人无法出现在名单中,例如Tom

df2  Receiver Payer Amount  
0     Paul    Jack   300 
1     Anna    Paul   600
2     Anna    Eva    100
3     Eva     Zack   400

I want to create a dataframe with all the amount of money received and payed by each people. 我想创建一个数据框,其中包含每个人收到并支付的所有金额。 So: 所以:

df3  NAME   RECEIVED  PAYED
0    Paul     300      600
1    Jack      0       300
2    Anna     700       0
3    Tom      NaN      NaN
4    Eva      400      100  

Use: 采用:

df3 = (df1.join(df2.melt('Amount', value_name='NAME', var_name='type')
                   .groupby(['NAME','type'])['Amount']
                   .sum()
                   .unstack(fill_value=0), on='NAME'))
print (df3)
   NAME  Payer  Receiver
0  Paul  600.0     300.0
1  Jack  300.0       0.0
2  Anna    0.0     700.0
3   Tom    NaN       NaN
4   Eva  100.0     400.0

Explanation : 说明

  1. First reshape DataFrame by melt 首先通过melt重塑DataFrame
  2. Aggregate sum per each NAME and type 每个NAMEtype sum
  3. Reshape by unstack for columns by second level of MultiIndex 通过重塑unstack为列由第二级MultiIndex
  4. Last left join to first DataFrame 最后左边join第一个DataFrame

Another solution with pivot_table : 使用pivot_table另一个解决方案:

df3 = (df1.join(df2.melt('Amount', value_name='NAME', var_name='type')
                   .pivot_table(index='NAME', 
                                columns='type', 
                                values='Amount', 
                                aggfunc='sum',
                                fill_value=0), on='NAME'))
print (df3)
   NAME  Payer  Receiver
0  Paul  600.0     300.0
1  Jack  300.0       0.0
2  Anna    0.0     700.0
3   Tom    NaN       NaN
4   Eva  100.0     400.0

Last if necessary rename columns: 如有必要,最后rename列:

df3 = df3.rename(columns={'Receiver':'RECEIVED','Payer':'PAYED'})
print (df3)
   NAME  PAYED  RECEIVED
0  Paul  600.0     300.0
1  Jack  300.0       0.0
2  Anna    0.0     700.0
3   Tom    NaN       NaN
4   Eva  100.0     400.0

Detail : 细节

print (df2.melt('Amount', value_name='NAME', var_name='type'))

   Amount      type  NAME
0     300  Receiver  Paul
1     600  Receiver  Anna
2     100  Receiver  Anna
3     400  Receiver   Eva
4     300     Payer  Jack
5     600     Payer  Paul
6     100     Payer   Eva
7     400     Payer  Zack

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM