如何用 python 中的 DataFrame 列的模式替换 NA 值？

Question

I'm completely new to Python (and this website) and am currently trying to replace NA values in specific dataframe columns with their mode.我对 Python（和本网站）完全陌生，目前正在尝试用其模式替换特定 dataframe 列中的 NA 值。 I've tried various methods which are not working.我尝试了各种无效的方法。 Please help me spot what I'm doing incorrectly:请帮助我发现我做错了什么：

Note: All the columns I'm working with are float64 types.注意：我使用的所有列都是float64类型。 All my codes run but when I check the null amount with df[cols_mode].isnull().sum() in the columns, it remains the same.我所有的代码都运行但是当我在列中使用df[cols_mode].isnull().sum()检查 null 数量时，它保持不变。

Method 1:方法一：

cols_mode = ['race', 'goal', 'date', 'go_out', 'career_c']

df[cols_mode].apply(lambda x: x.fillna(x.mode, inplace=True))

I tried the Imputer method too but encountered the same result我也尝试了 Imputer 方法但遇到了相同的结果

Method 2:方法二：

for column in df[['race', 'goal', 'date', 'go_out', 'career_c']]:
    mode = df[column].mode()
    df[column] = df[column].fillna(mode)

Method 3:方法三：

df['race'].fillna(df.race.mode(), inplace=True)
df['goal'].fillna(df.goal.mode(), inplace=True)
df['date'].fillna(df.date.mode(), inplace=True)
df['go_out'].fillna(df.go_out.mode(), inplace=True)
df['career_c'].fillna(df.career_c.mode(), inplace=True)

Method 4: My methods become more and more of a manual process and finally this one works:方法 4：我的方法越来越像一个手动过程，最后这个方法起作用了：

df['race'].fillna(2.0, inplace=True)
df['goal'].fillna(1.0, inplace=True)
df['date'].fillna(6.0, inplace=True)
df['go_out'].fillna(2.0, inplace=True)
df['career_c'].fillna(2.0, inplace=True)

Answer 1

mode returns a Series, so you still need to access the row you want before replacing NaN values in your DataFrame. mode返回一个系列，因此在替换 DataFrame 中的NaN值之前，您仍然需要访问所需的行。

for column in ['race', 'goal', 'date', 'go_out', 'career_c']:
    df[column].fillna(df[column].mode()[0], inplace=True)

If you want to apply it to all the columns of the DataFrame, then:如果要将其应用于 DataFrame 的所有列，则：

for column in df.columns:
    df[column].fillna(df[column].mode()[0], inplace=True)

Answer 2

Alternatively I used another data frame only containing the Modes of the columns, however you need to make sure that NaN is not the Mode of any of the columns或者，我使用了另一个仅包含列模式的数据框，但是您需要确保 NaN 不是任何列的模式

 #Create the Mode Data frame 
    df_mode=df.mode()
#simply using a forloop with object 
    for x in df.columns.values:
        df[x]=df[x].fillna(value=df_mode[x].iloc[0])

You can also use in place method.您也可以使用就地方法。 This was useful while working in large data sets I had simply created a data frame with all mean mode median for all the columns.这在处理大型数据集时很有用，我只是创建了一个数据框，所有列的所有均值模式中值。

Answer 3

Why not use a dictionary for your columns and pass that through instead?为什么不为您的列使用字典并通过它来代替？

dic = {'race': 2.0, 'goal': 1.0, 'date': 6.0, 'go_out': 2.0, 'career_c': 2.0}
df.fillna(value=dic)

Answer 4

For a single column imputation对于单列插补

df['col'] = df['col'].fillna(df['col'].mode()[0])

if you want to apply the same to a list of columns then loop over it.如果你想将相同的应用到列列表然后循环它。

如何用 python 中的 DataFrame 列的模式替换 NA 值？

问题描述

4 个解决方案

解决方案1
22 2016-11-15 23:07:45

解决方案2
0 2018-12-28 08:17:15

解决方案3
0 2020-04-24 00:25:09

解决方案4
0 2023-01-17 07:03:55

如何用 python 中的 DataFrame 列的模式替换 NA 值？

问题描述

4 个解决方案

解决方案1 22 2016-11-15 23:07:45

解决方案2 0 2018-12-28 08:17:15

解决方案3 0 2020-04-24 00:25:09

解决方案4 0 2023-01-17 07:03:55

解决方案1
22 2016-11-15 23:07:45

解决方案2
0 2018-12-28 08:17:15

解决方案3
0 2020-04-24 00:25:09

解决方案4
0 2023-01-17 07:03:55