简体   繁体   English

使用模式填充缺失值 NAN 在 Pandas 中不起作用

[英]Fill Missing value NAN with Mode is Not Working in Pandas

All the values is the df are one hot encoded, ie 0 / 1所有的值都是 df 是一种热编码,即 0 / 1

Tried试过了

fill_mode = lambda col: col.fillna(col.mode())
df = df.apply(fill_mode, axis=0)
df.isnull().sum()

Got得到

id      0
1           0
2           2
3           0

Expect all Null or NAN is filled with Mode.期望所有 Null 或 NAN 都被 Mode 填充。

col.mode() returns a series, not a single number. col.mode()返回一个系列,而不是一个数字。 So col.fillna(col.mode()) will try to align the index of col.mode() with col and most likely you won't get anything updated.因此col.fillna(col.mode())将尝试将col.mode()的索引与col对齐,并且很可能您不会得到任何更新。 Maybe you want to do:也许你想做:

fill_mode = lambda col: col.fillna(col.mode()[0])

Adjust your fill_mode function调整您的填充模式fill_mode

fill_mode = lambda col: col.fillna(col.mode().iloc[0])
df.apply(fill_mode, axis=0)

mode function return a series, fillna will match the index when received the series, however, in your case, we should remove the index match impacted. mode function 返回一个系列, fillna将在收到系列时匹配索引,但是,在您的情况下,我们应该删除受影响的索引匹配。

Example例子

df=pd.DataFrame({'1':[np.nan,2,np.nan],'2':[1,1,np.nan]})

fill_mode = lambda col: col.fillna(col.mode())
print(df.apply(fill_mode, axis=0))
     1    2
0  2.0  1.0 # notice only the first item fill, since the out put of mode is index 0 with value 2
1  2.0  1.0
2  NaN  NaN 

df['1'].mode()
0    2.0
dtype: float64

In that case the df only fill the first value since the index matched.在这种情况下,df 仅填充索引匹配后的第一个值。

We add the .iloc make it out put number, and will drop the index match with fillna我们添加.iloc使其输出编号,并将使用fillna删除索引匹配

df['1'].mode().iloc[0]
2.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM