There is a DataFrame with some NaN values:
df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 2, 2], 'B': [1, 1, np.NaN, 2, 3, np.NaN, 3, 4]})
A B
0 1 1.0
1 1 1.0
2 1 NaN <-
3 1 2.0
4 2 3.0
5 2 NaN <-
6 2 3.0
7 2 4.0
Set label 'A' as an index:
df.set_index(['A'], inplace=True)
Now there are two groups with the indices 1 and 2:
B
A
1 1.0
1 1.0
1 NaN <-
1 2.0
2 3.0
2 NaN <-
2 3.0
2 4.0
What is the best way to do fillna() on the DataFrame with the most frequent value from each group?
So, I would like to do a call of something like this:
df.B.fillna(df.groupby('A').B...)
and get:
B
A
1 1.0
1 1.0
1 1.0 <-
1 2.0
2 3.0
2 3.0 <-
2 3.0
2 4.0
I hope there's a way and it also works with multiindex.
A
and apply fillna()
to B within each group; value_counts
, use idxmax()
to pick up the most frequent value;Assuming there are no groups where all values are missing:
df['B'] = df.groupby('A')['B'].transform(lambda x: x.fillna(x.dropna().value_counts().idxmax()))
df
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.