Replace bad values with mean of pandas group by

Question

I'd like to replace bad values negative values(-666) and NaNs in a pandas series with grouped or aggregated mean value. Sample DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame({
                   'cat': ['A','B','C','A','B','C','A','B','C'],
                   'val': [np.nan, 10, 4, 5, -666, -666, 15, 20, 10]
                 })

Expected output:

How do I fix the bad values with grouped mean?

Answer 1

You could use where to mask the unwanted values; then replace them using the outcome of groupby + transform mean :

tmp = df['val'].where(lambda x: x>0)
df['val'] = tmp.fillna(tmp.groupby(df['cat']).transform('mean'))

We can also derive the same result using the one-liner below (less efficient than the one above):

df['val'] = df['val'].where(lambda x: x>0, df.groupby('cat')['val'].transform(lambda x: x[x>0].mean()))

Output:

  cat   val
0   A  10.0
1   B  10.0
2   C   4.0
3   A   5.0
4   B  15.0
5   C   7.0
6   A  15.0
7   B  20.0
8   C  10.0

Replace bad values with mean of pandas group by

Question

1 answers

solution1
0

Replace bad values with mean of pandas group by

Question

1 answers

solution1 0

solution1
0