简体   繁体   中英

pandas: What is the best way to do fillna() on a (multiindexed) DataFrame with the most frequent value from every group?

There is a DataFrame with some NaN values:

df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 2, 2], 'B': [1, 1, np.NaN, 2, 3, np.NaN, 3, 4]})

   A    B
0  1  1.0
1  1  1.0
2  1  NaN <-
3  1  2.0
4  2  3.0
5  2  NaN <-
6  2  3.0
7  2  4.0

Set label 'A' as an index:

df.set_index(['A'], inplace=True)

Now there are two groups with the indices 1 and 2:

     B
A     
1  1.0
1  1.0
1  NaN <-
1  2.0
2  3.0
2  NaN <-
2  3.0
2  4.0

What is the best way to do fillna() on the DataFrame with the most frequent value from each group?

So, I would like to do a call of something like this:

df.B.fillna(df.groupby('A').B...)

and get:

     B
A     
1  1.0
1  1.0
1  1.0 <-
1  2.0
2  3.0
2  3.0 <-
2  3.0
2  4.0

I hope there's a way and it also works with multiindex.

  • groupby column A and apply fillna() to B within each group;
  • drop missing values from the series, and do value_counts , use idxmax() to pick up the most frequent value;

Assuming there are no groups where all values are missing:

df['B'] = df.groupby('A')['B'].transform(lambda x: x.fillna(x.dropna().value_counts().idxmax()))
df

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM