简体   繁体   中英

Python: How to add specific columns of .mean to dataframe

How can I add the means of b and c to my dataframe? I tried a merge but it didn't seem to work. So I want two extra columns b_mean and c_mean added to my dataframe with the results of df.groupBy('date').mean()

DataFrame

  a  b  c  date
0  2  3  5     1
1  5  9  1     1
2  3  7  1     1

I have the following code

import pandas as pd

a = [{'date': 1,'a':2, 'b':3, 'c':5}, {'date':1, 'a':5, 'b':9, 'c':1}, {'date':1, 'a':3, 'b':7, 'c':1}]

df = pd.DataFrame(a)

x =  df.groupby('date').mean()

Edit:

Desired output would be the following df.groupby('date').mean() returns:

             a         b         c
date                              
1     3.333333  6.333333  2.333333

My desired result would be the following data frame

   a  b  c  date  a_mean   b_mean
0  2  3  5     1  3.3333   6.3333
1  5  9  1     1  3.3333   6.3333 
2  3  7  1     1  3.3333   6.3333

As @ayhan mentioned, you can use pd.groupby.transform() for this. Transform is like apply, but it uses the same index as the original dataframe instead of the unique values in the column(s) grouped on.

df['a_mean'] = df.groupby('date')['a'].transform('mean')
df['b_mean'] = df.groupby('date')['b'].transform('mean')

>>> df
   a  b  c  date    b_mean    a_mean
0  2  3  5     1  6.333333  3.333333
1  5  9  1     1  6.333333  3.333333
2  3  7  1     1  6.333333  3.333333

solution
Use join with a rsuffix parameter.

df.join(df.groupby('date').mean(), on='date', rsuffix='_mean')

   a  b  c  date    a_mean    b_mean    c_mean
0  2  3  5     1  3.333333  6.333333  2.333333
1  5  9  1     1  3.333333  6.333333  2.333333
2  3  7  1     1  3.333333  6.333333  2.333333

We can limit it to just ['a', 'b']

df.join(df.groupby('date')[['a', 'b']].mean(), on='date', rsuffix='_mean')

   a  b  c  date    a_mean    b_mean
0  2  3  5     1  3.333333  6.333333
1  5  9  1     1  3.333333  6.333333
2  3  7  1     1  3.333333  6.333333

extra credit
Not really answering your question... but I thought it was neat!

d1 = df.set_index('date', append=True).swaplevel(0, 1)
g = df.groupby('date').describe()
d1.append(g).sort_index()

                   a         b         c
date                                    
1    0      2.000000  3.000000  5.000000
     1      5.000000  9.000000  1.000000
     2      3.000000  7.000000  1.000000
     25%    2.500000  5.000000  1.000000
     50%    3.000000  7.000000  1.000000
     75%    4.000000  8.000000  3.000000
     count  3.000000  3.000000  3.000000
     max    5.000000  9.000000  5.000000
     mean   3.333333  6.333333  2.333333
     min    2.000000  3.000000  1.000000
     std    1.527525  3.055050  2.309401

I assuming that you need mean value of a column added as a new column value in the dataframe. Please correct me otherwise.

You can achieve by taking the mean of column directly and create a new column by assigning like

In [1]: import pandas as pd

In [2]: a = [{'date': 1,'a':2, 'b':3, 'c':5}, {'date':1, 'a':5, 'b':9, 'c':1}, {'date':1, 'a':3, 'b':7, 'c':1}]

In [3]: df = pd.DataFrame(a)

In [4]: for col in ['b','c']:
    ...:      df[col+"_mean"] = df.groupby('date')[col].transform('mean')

In [5]: df
Out[5]:
   a  b  c  date    b_mean    c_mean
0  2  3  5     1  6.333333  2.333333
1  5  9  1     1  6.333333  2.333333
2  3  7  1     1  6.333333  2.333333

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM