简体   繁体   中英

Add column with average value grouped by column

I want to replace column value of dataframe with mean( without zeros ) value of column grouped by another column.

Dataframe df is like:

ID | TYPE | rate
-------------
1  |    A | 0 <- Replace this
2  |    B | 2
3  |    C | 1
4  |    A | 2
5  |    C | 1
6  |    C | 0 <- Replace this
7  |    C | 8
8  |    C | 2
9  |    D | 0 <- Replace this

I have to replace values in rating where rating = 0:

df['rate'][df['rate']==0] = ?

with average value for that TYPE .

Average( without zeros ) value for every type is:

A = 2/1 = 2

B = 2/1 = 2

C = (1 + 1 + 8 + 2)/4 = 3

D = 0 ( default value when there isn't information for type )

Expected result:

ID | TYPE | rate
-------------
1  |    A | 2 <- Changed
2  |    B | 2
3  |    C | 1
4  |    A | 2
5  |    C | 1
6  |    C | 3 <- Changed
7  |    C | 8
8  |    C | 2
9  |    D | 0 <- Changed

You could mask the rate column in the dataframe, GroupBy the TYPE and transform with the mean , which will exlude NaNs . The use fillna to replace the values in the masked dataframe:

ma = df.rate.mask(df.rate.eq(0))
df['rate'] = ma.fillna(ma.groupby(df.TYPE).transform('mean').fillna(0))

   ID  TYPE  rate
0   1    A   2.0
1   2    B   2.0
2   3    C   1.0
3   4    A   2.0
4   5    C   1.0
5   6    C   3.0
6   7    C   8.0
7   8    C   2.0
8   9    D   0.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM