[英]I cannot get Fillna in Python to Work when using Mode to Replace NaNs with Most Frequent Column String Value
Strange problem.奇怪的问题。
I have a dtype == object dataframe column with string values and NaNs.我有一个 dtype == object dataframe 列,其中包含字符串值和 NaN。 Looks like this:
看起来像这样:
df
Response
0 Email
1 NaN
2 NaN
3 Call
4 Email
5 Email
I want to use fillna to fill the NaN values with the most frequently occurring value - which in this case is 'email'.我想使用 fillna 用最常出现的值填充 NaN 值——在这种情况下是“电子邮件”。
code looks like this:代码如下所示:
import numpy as np
import pandas as pd
most_frequent_cat = str(df['Response']).mode())
df['Response_imputed'] = df['Response']
df['Response_imputed'].fillna(most_freq_cat, inplace = True)
The results look like this:结果如下所示:
df Response
0 Email
1 0 Email\ndtype: object
2 0 Email\ndtype: object
3 Call
4 Email
5 Email
0 Email\ndtype: object
is different than Email
0 Email\ndtype: object
不同于Email
If I remove the str
there is no replacement of the original NaN
s如果我删除
str
,则不会替换原始NaN
s
What am I doing wrong?我究竟做错了什么?
Don't use DataFrame.fillna
with inplace=True
.不要将
DataFrame.fillna
与 inplace inplace=True
一起使用。 Actually I would recommend forgetting that argument exists entirely .实际上,我建议完全忘记该论点。 Use
Series.fillna
instead since you only need this on one column and assign the result back.请改用
Series.fillna
,因为您只需要在一列上使用它并将结果分配回去。
Another thing to note is mode
can return multiple modes if there is no single mode.另外需要注意的是,如果没有单一模式,
mode
可以返回多个模式。 In that case it should suffice to either select the first one, or one at random (an exercise for you).在这种情况下,它应该足够 select 第一个,或者随机一个(给你一个练习)。
Here's my recommended syntax:这是我推荐的语法:
# call fillna on the column and assign it back
df['Response'] = df['Response'].fillna(df['Response'].mode().iat[0])
df
Response
0 Email
1 Email
2 Email
3 Call
4 Email
5 Email
You can also do a per column fill if you have multiple columns to fill NaNs for.如果您有多个要填充 NaN 的列,也可以按列填充。 Again the procedure is similar, call mode on your columns, then get the first mode for each column and use it as an argument to
DataFeame.fillna
this time:同样,过程类似,在列上调用模式,然后为每一列获取第一个模式,并将其用作
DataFeame.fillna
的参数:
df.fillna(df.mode().iloc[0])
Response
0 Email
1 Email
2 Email
3 Call
4 Email
5 Email
import pandas as pd
d = {'Response': ['Email','NaN','NaN','Call','Email','Email']}
df = pd.DataFrame(data=d)
df['Response'].mode()
output: output:
0 Email
dtype: object
Take the first line:取第一行:
df['Response'].mode()[0]
output: output:
'Email'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.