简体   繁体   中英

How to group and aggregate different dataframes in pandas

df1

A B
a 1
a 1
a 4
b 1 
b 3

df2

A B
a 1
a 2
c 3 
c 5

df1.groupby("A").size()

a 3
b 2

df2.groupby("A").size()

a 2
c 2

I'd like to get following size aggregation

   df1 df2
a  3    2
b  2    0 
c  0    2  

Are there any way to achieve this? I'd like to know aggregation method.

If someone has opinion,please let me know. Thanks

  1. You can use pd.concat on the two grouped dataframes and pass axis=1 (This is essentailly an outer join with pd.merge , but the syntax is a bit more concise).
  2. Then, just do some cleanup with .fillna(0) , rename columns as desired with .rename() and use .astype(int) to make the columns data types integers:

df3 = (pd.concat([df1.groupby("A").size(), df2.groupby("A").size()], axis=1)
      .fillna(0).rename({0 : 'df1', 1 : 'df2'}, axis=1).astype(int))
df3
Out[1]: 
   df1  df2
a    3    2
b    2    0
c    0    2

If you're working in pandas I would do an outer join.

data = [['a','a','a','b','b'],[1,1,4,1,3]]
df1 = pd.DataFrame(data).T
df1.columns = ['A','B']

data = [['a','a','c','c'],[1,2,3,5]]
df2 = pd.DataFrame(data).T
df2.columns = ['A','B']

temp1 = pd.DataFrame(df1.groupby("A").size())
temp2 = pd.DataFrame(df2.groupby("A").size())

df = temp1.merge(temp2,how='outer',left_index=True,right_index=True)
df.fillna(0)

output

0_x 0_y
A       
a   3.0 2.0
b   2.0 0.0
c   0.0 2.0

In concat is possible use parameter keys for new column names, because working with Series , then replace missing values and convert to integers:

df3 = (pd.concat([df1.groupby("A").size(), 
                  df2.groupby("A").size()], axis=1, keys=['df1','df2'])
         .fillna(0)
         .astype(int))
print (df3)
   df1  df2
a    3    2
b    2    0
c    0    2

Another idea is first use concat , convert MultiIndex to columns by DataFrame.reset_index , so possible use DataFrame.pivot_table , last use DataFrame.rename_axis :

df3 = (pd.concat([df1, df2], keys=['df1','df2'])
         .reset_index()
         .pivot_table(index='A', columns='level_0', aggfunc='size', fill_value=0)
         .rename_axis(columns=None, index=None)
       )
print (df3)
   df1  df2
a    3    2
b    2    0
c    0    2

You can do the below:

In [883]: df1 = df1.groupby("A").size().reset_index()

In [884]: df2 = df2.groupby("A").size().reset_index()

In [886]: df1.merge(df2, on='A', how='outer').fillna(0).rename(columns={'0_x': 'df1', '0_y':'df2'}).set_index('A')
Out[886]: 
   df1  df2
A          
a  3.0  2.0
b  2.0  0.0
c  0.0  2.0
 #groupby using agg to rename axis.
   g=df1.groupby("A", as_index=False)['B'].agg({'df1':'size'})
    h=df2.groupby("A", as_index=False)['B'].agg({'df2':'size'})

#concat the resultant datframe, fffiil, bfill and dro duplicates

(pd.concat([g,h], ignore_index=True).sort_values(by='A').ffill().bfill()).drop_duplicates()



  A  df1  df2
0  a  3.0  2.0
1  b  2.0  2.0
3  c  2.0  2.0

Just a variant of @David's answer :

frames = {"df1": df1, "df2": df2}

pd.concat(
    {key: value.groupby("A").size() for key, value in frames.items()}, axis=1
).fillna(0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM