使用模式将 NaN 替换为最常见的列字符串值时，我无法让 Python 中的 Fillna 工作

Question

Strange problem.奇怪的问题。

I have a dtype == object dataframe column with string values and NaNs.我有一个 dtype == object dataframe 列，其中包含字符串值和 NaN。 Looks like this:看起来像这样：

df   
     Response    
0    Email
1    NaN
2    NaN
3    Call
4    Email
5    Email

I want to use fillna to fill the NaN values with the most frequently occurring value - which in this case is 'email'.我想使用 fillna 用最常出现的值填充 NaN 值——在这种情况下是“电子邮件”。

code looks like this:代码如下所示：

import numpy as np
import pandas as pd

most_frequent_cat = str(df['Response']).mode())
df['Response_imputed'] = df['Response']
df['Response_imputed'].fillna(most_freq_cat, inplace = True)

The results look like this:结果如下所示：

df   Response    

0    Email
1    0    Email\ndtype: object
2    0    Email\ndtype: object
3    Call
4    Email
5    Email

0 Email\ndtype: object is different than Email 0 Email\ndtype: object不同于Email

If I remove the str there is no replacement of the original NaN s如果我删除str ，则不会替换原始NaN s

What am I doing wrong?我究竟做错了什么？

Answer 1

Don't use DataFrame.fillna with inplace=True .不要将DataFrame.fillna与 inplace inplace=True一起使用。 Actually I would recommend forgetting that argument exists entirely .实际上，我建议完全忘记该论点。 Use Series.fillna instead since you only need this on one column and assign the result back.请改用Series.fillna ，因为您只需要在一列上使用它并将结果分配回去。

Another thing to note is mode can return multiple modes if there is no single mode.另外需要注意的是，如果没有单一模式， mode可以返回多个模式。 In that case it should suffice to either select the first one, or one at random (an exercise for you).在这种情况下，它应该足够 select 第一个，或者随机一个（给你一个练习）。

Here's my recommended syntax:这是我推荐的语法：

# call fillna on the column and assign it back
df['Response'] = df['Response'].fillna(df['Response'].mode().iat[0])
df
 
  Response
0    Email
1    Email
2    Email
3     Call
4    Email
5    Email

You can also do a per column fill if you have multiple columns to fill NaNs for.如果您有多个要填充 NaN 的列，也可以按列填充。 Again the procedure is similar, call mode on your columns, then get the first mode for each column and use it as an argument to DataFeame.fillna this time:同样，过程类似，在列上调用模式，然后为每一列获取第一个模式，并将其用作DataFeame.fillna的参数：

df.fillna(df.mode().iloc[0])

  Response
0    Email
1    Email
2    Email
3     Call
4    Email
5    Email

Answer 2

import pandas as pd
d = {'Response': ['Email','NaN','NaN','Call','Email','Email']}
df = pd.DataFrame(data=d)

df['Response'].mode()

output: output：

0    Email
dtype: object

Take the first line:取第一行：

df['Response'].mode()[0]

output: output：

'Email'

使用模式将 NaN 替换为最常见的列字符串值时，我无法让 Python 中的 Fillna 工作

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-12-16 22:54:54

解决方案2
1 2020-12-16 23:05:08

使用模式将 NaN 替换为最常见的列字符串值时，我无法让 Python 中的 Fillna 工作

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-12-16 22:54:54

解决方案2 1 2020-12-16 23:05:08

解决方案1
1 已采纳 2020-12-16 22:54:54

解决方案2
1 2020-12-16 23:05:08