I'd like to replace bad values negative values(-666)
and NaNs
in a pandas series with grouped or aggregated mean value. Sample DataFrame:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'cat': ['A','B','C','A','B','C','A','B','C'],
'val': [np.nan, 10, 4, 5, -666, -666, 15, 20, 10]
})
Expected output:
A 10
B 10
C 4
A 5
B 15
C 5
A 15
B 20
C 10
How do I fix the bad values with grouped mean?
You could use where
to mask the unwanted values; then replace them using the outcome of groupby
+ transform mean
:
tmp = df['val'].where(lambda x: x>0)
df['val'] = tmp.fillna(tmp.groupby(df['cat']).transform('mean'))
We can also derive the same result using the one-liner below (less efficient than the one above):
df['val'] = df['val'].where(lambda x: x>0, df.groupby('cat')['val'].transform(lambda x: x[x>0].mean()))
Output:
cat val
0 A 10.0
1 B 10.0
2 C 4.0
3 A 5.0
4 B 15.0
5 C 7.0
6 A 15.0
7 B 20.0
8 C 10.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.