简体   繁体   English

使用模式将 NaN 替换为最常见的列字符串值时,我无法让 Python 中的 Fillna 工作

[英]I cannot get Fillna in Python to Work when using Mode to Replace NaNs with Most Frequent Column String Value

Strange problem.奇怪的问题。

I have a dtype == object dataframe column with string values and NaNs.我有一个 dtype == object dataframe 列,其中包含字符串值和 NaN。 Looks like this:看起来像这样:

df   
     Response    
0    Email
1    NaN
2    NaN
3    Call
4    Email
5    Email

I want to use fillna to fill the NaN values with the most frequently occurring value - which in this case is 'email'.我想使用 fillna 用最常出现的值填充 NaN 值——在这种情况下是“电子邮件”。

code looks like this:代码如下所示:

import numpy as np
import pandas as pd

most_frequent_cat = str(df['Response']).mode())
df['Response_imputed'] = df['Response']
df['Response_imputed'].fillna(most_freq_cat, inplace = True)

The results look like this:结果如下所示:

df   Response    

0    Email
1    0    Email\ndtype: object
2    0    Email\ndtype: object
3    Call
4    Email
5    Email

0 Email\ndtype: object is different than Email 0 Email\ndtype: object不同于Email

If I remove the str there is no replacement of the original NaN s如果我删除str ,则不会替换原始NaN s

What am I doing wrong?我究竟做错了什么?

Don't use DataFrame.fillna with inplace=True .不要将DataFrame.fillna与 inplace inplace=True一起使用。 Actually I would recommend forgetting that argument exists entirely .实际上,我建议完全忘记该论点 Use Series.fillna instead since you only need this on one column and assign the result back.请改用Series.fillna ,因为您只需要在一列上使用它并将结果分配回去。

Another thing to note is mode can return multiple modes if there is no single mode.另外需要注意的是,如果没有单一模式, mode可以返回多个模式。 In that case it should suffice to either select the first one, or one at random (an exercise for you).在这种情况下,它应该足够 select 第一个,或者随机一个(给你一个练习)。

Here's my recommended syntax:这是我推荐的语法:

# call fillna on the column and assign it back
df['Response'] = df['Response'].fillna(df['Response'].mode().iat[0])
df
 
  Response
0    Email
1    Email
2    Email
3     Call
4    Email
5    Email

You can also do a per column fill if you have multiple columns to fill NaNs for.如果您有多个要填充 NaN 的列,也可以按列填充。 Again the procedure is similar, call mode on your columns, then get the first mode for each column and use it as an argument to DataFeame.fillna this time:同样,过程类似,在列上调用模式,然后为每一列获取第一个模式,并将其用作DataFeame.fillna的参数:

df.fillna(df.mode().iloc[0])

  Response
0    Email
1    Email
2    Email
3     Call
4    Email
5    Email
import pandas as pd
d = {'Response': ['Email','NaN','NaN','Call','Email','Email']}
df = pd.DataFrame(data=d)

df['Response'].mode() 

output: output:

0    Email
dtype: object

Take the first line:取第一行:

df['Response'].mode()[0] 

output: output:

'Email'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Fillna最频繁,如果最频繁发生其他fillna,整个列的频率最高 - Fillna with most frequent if most frequent occurs else fillna with most frequent value of the entire column 在 Python 中使用 Replace() 或 fillna() 将 NAN 替换为 Pandas 中列的字典值 - Replace NAN with Dictionary Value for a column in Pandas using Replace() or fillna() in Python 使用Pandas中的fillna()方法替换列中的特定字符串值 - Using the fillna() method from Pandas to replace a particular string value in a column 如何在 csv 列中找到最频繁的字符串值并返回它? (Python) - How to find most frequent string value in a csv column and return it? (Python) 如何获取列中出现频率最高的值的个数? - How to get the number of the most frequent value in a column? Python:获取列表中最低、最频繁的值 - Python: Get the lowest, most frequent value of a list python:获取字典列表中最频繁的值 - python: get the most frequent value in a list of dictionaries 用最频繁的列项目替换缺失值。 (Imputer())-Python scikit-learn - Replace missing value with most frequent column item. (Imputer())-Python scikit-learn 我可以用分组数据框中的列模式替换Nans吗? - Can I replace Nans with the mode of a column in a grouped data frame? 如何在 Python 中的数据框中填充类型为 Float 的列的模式值 - How to fillna with mode value of a Column with type Float in a data frame in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM