[英]How to replace NA values with mode of a DataFrame column in python?
I'm completely new to Python (and this website) and am currently trying to replace NA values in specific dataframe columns with their mode.我对 Python(和本网站)完全陌生,目前正在尝试用其模式替换特定 dataframe 列中的 NA 值。 I've tried various methods which are not working.
我尝试了各种无效的方法。 Please help me spot what I'm doing incorrectly:
请帮助我发现我做错了什么:
Note: All the columns I'm working with are float64
types.注意:我使用的所有列都是
float64
类型。 All my codes run but when I check the null amount with df[cols_mode].isnull().sum()
in the columns, it remains the same.我所有的代码都运行但是当我在列中使用
df[cols_mode].isnull().sum()
检查 null 数量时,它保持不变。
Method 1:方法一:
cols_mode = ['race', 'goal', 'date', 'go_out', 'career_c']
df[cols_mode].apply(lambda x: x.fillna(x.mode, inplace=True))
I tried the Imputer method too but encountered the same result我也尝试了 Imputer 方法但遇到了相同的结果
Method 2:方法二:
for column in df[['race', 'goal', 'date', 'go_out', 'career_c']]:
mode = df[column].mode()
df[column] = df[column].fillna(mode)
Method 3:方法三:
df['race'].fillna(df.race.mode(), inplace=True)
df['goal'].fillna(df.goal.mode(), inplace=True)
df['date'].fillna(df.date.mode(), inplace=True)
df['go_out'].fillna(df.go_out.mode(), inplace=True)
df['career_c'].fillna(df.career_c.mode(), inplace=True)
Method 4: My methods become more and more of a manual process and finally this one works:方法 4:我的方法越来越像一个手动过程,最后这个方法起作用了:
df['race'].fillna(2.0, inplace=True)
df['goal'].fillna(1.0, inplace=True)
df['date'].fillna(6.0, inplace=True)
df['go_out'].fillna(2.0, inplace=True)
df['career_c'].fillna(2.0, inplace=True)
mode
returns a Series, so you still need to access the row you want before replacing NaN
values in your DataFrame. mode
返回一个系列,因此在替换 DataFrame 中的NaN
值之前,您仍然需要访问所需的行。
for column in ['race', 'goal', 'date', 'go_out', 'career_c']:
df[column].fillna(df[column].mode()[0], inplace=True)
If you want to apply it to all the columns of the DataFrame, then:如果要将其应用于 DataFrame 的所有列,则:
for column in df.columns:
df[column].fillna(df[column].mode()[0], inplace=True)
Alternatively I used another data frame only containing the Modes of the columns, however you need to make sure that NaN is not the Mode of any of the columns或者,我使用了另一个仅包含列模式的数据框,但是您需要确保 NaN 不是任何列的模式
#Create the Mode Data frame
df_mode=df.mode()
#simply using a forloop with object
for x in df.columns.values:
df[x]=df[x].fillna(value=df_mode[x].iloc[0])
You can also use in place method.您也可以使用就地方法。 This was useful while working in large data sets I had simply created a data frame with all mean mode median for all the columns.
这在处理大型数据集时很有用,我只是创建了一个数据框,所有列的所有均值模式中值。
Why not use a dictionary for your columns and pass that through instead?为什么不为您的列使用字典并通过它来代替?
dic = {'race': 2.0, 'goal': 1.0, 'date': 6.0, 'go_out': 2.0, 'career_c': 2.0}
df.fillna(value=dic)
For a single column imputation对于单列插补
df['col'] = df['col'].fillna(df['col'].mode()[0])
if you want to apply the same to a list of columns then loop over it.如果你想将相同的应用到列列表然后循环它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.