I have a pandas dataframe with several columns. I'd like to fillna's
in select columns with mean of each group.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'cat': ['A','A','A','B','B','B','C','C'],
'v1': [10, 12, np.nan, 10, 14, np.nan, 11, np.nan],
'v2': [12, 8, np.nan, np.nan, 6, 12, 10, np.nan]
})
I am looking for a solution that's scalable, meaning, I could apply do the operation on several columns.
np.nan
's will be filled with mean
of each group.
Expected output:
cat v1 v2
A 10 12
A 12 8
A 11 10
B 10 9
B 14 6
B 12 12
C 11 10
C 11 10
Other similar questions are limited to a single column, I am looking for a solution that is generalizable and works imputing missing NA
s for several columns.
This will replace all of the np.nan's with the mean of the column
import pandas as pd
import numpy as np
df = pd.DataFrame({
'cat': ['A','A','A','B','B','B','C','C'],
'v1': [10, 12, np.nan, 10, 14, np.nan, 11, np.nan],
'v2': [12, 8, np.nan, np.nan, 6, 12, 10, np.nan]
})
for x in df.columns.drop('cat'):
mean_of_column = df[x].mean()
df[x].fillna(mean_of_column, inplace = True)
df
Please note that this will make the column a float since them mean is not a neat int. If you wanted to, however, you could continue to work with it to remove the decimal.
Try this:
df = df.fillna(df.groupby('cat').transform('mean'))
Output:
cat v1 v2
0 A 10.0 12.0
1 A 12.0 8.0
2 A 11.0 10.0
3 B 10.0 9.0
4 B 14.0 6.0
5 B 12.0 12.0
6 C 11.0 10.0
7 C 11.0 10.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.