简体   繁体   中英

Replace bad values with mean of pandas group by

I'd like to replace bad values negative values(-666) and NaNs in a pandas series with grouped or aggregated mean value. Sample DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame({
                   'cat': ['A','B','C','A','B','C','A','B','C'],
                   'val': [np.nan, 10, 4, 5, -666, -666, 15, 20, 10]
                 })

Expected output:

 A 10
 B 10
 C 4
 A 5
 B 15
 C 5
 A 15
 B 20
 C 10

How do I fix the bad values with grouped mean?

You could use where to mask the unwanted values; then replace them using the outcome of groupby + transform mean :

tmp = df['val'].where(lambda x: x>0)
df['val'] = tmp.fillna(tmp.groupby(df['cat']).transform('mean'))

We can also derive the same result using the one-liner below (less efficient than the one above):

df['val'] = df['val'].where(lambda x: x>0, df.groupby('cat')['val'].transform(lambda x: x[x>0].mean()))

Output:

  cat   val
0   A  10.0
1   B  10.0
2   C   4.0
3   A   5.0
4   B  15.0
5   C   7.0
6   A  15.0
7   B  20.0
8   C  10.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM