简体   繁体   中英

Pandas: how to merge different dataframes?

I have two dataframes df1 and df2 .

The first dataframe contain the name of people:

df1  NAME
0    Paul
1    Jack
2    Anna
3    Tom
4    Eva

and a second name with the information of the amount of money received and payed by each person. There are some person that are not in df1 , for instance Zack . Some people could not appear in the list, for instance Tom

df2  Receiver Payer Amount  
0     Paul    Jack   300 
1     Anna    Paul   600
2     Anna    Eva    100
3     Eva     Zack   400

I want to create a dataframe with all the amount of money received and payed by each people. So:

df3  NAME   RECEIVED  PAYED
0    Paul     300      600
1    Jack      0       300
2    Anna     700       0
3    Tom      NaN      NaN
4    Eva      400      100  

Use:

df3 = (df1.join(df2.melt('Amount', value_name='NAME', var_name='type')
                   .groupby(['NAME','type'])['Amount']
                   .sum()
                   .unstack(fill_value=0), on='NAME'))
print (df3)
   NAME  Payer  Receiver
0  Paul  600.0     300.0
1  Jack  300.0       0.0
2  Anna    0.0     700.0
3   Tom    NaN       NaN
4   Eva  100.0     400.0

Explanation :

  1. First reshape DataFrame by melt
  2. Aggregate sum per each NAME and type
  3. Reshape by unstack for columns by second level of MultiIndex
  4. Last left join to first DataFrame

Another solution with pivot_table :

df3 = (df1.join(df2.melt('Amount', value_name='NAME', var_name='type')
                   .pivot_table(index='NAME', 
                                columns='type', 
                                values='Amount', 
                                aggfunc='sum',
                                fill_value=0), on='NAME'))
print (df3)
   NAME  Payer  Receiver
0  Paul  600.0     300.0
1  Jack  300.0       0.0
2  Anna    0.0     700.0
3   Tom    NaN       NaN
4   Eva  100.0     400.0

Last if necessary rename columns:

df3 = df3.rename(columns={'Receiver':'RECEIVED','Payer':'PAYED'})
print (df3)
   NAME  PAYED  RECEIVED
0  Paul  600.0     300.0
1  Jack  300.0       0.0
2  Anna    0.0     700.0
3   Tom    NaN       NaN
4   Eva  100.0     400.0

Detail :

print (df2.melt('Amount', value_name='NAME', var_name='type'))

   Amount      type  NAME
0     300  Receiver  Paul
1     600  Receiver  Anna
2     100  Receiver  Anna
3     400  Receiver   Eva
4     300     Payer  Jack
5     600     Payer  Paul
6     100     Payer   Eva
7     400     Payer  Zack

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM