简体   繁体   English

如何用 python 中的 DataFrame 列的模式替换 NA 值?

[英]How to replace NA values with mode of a DataFrame column in python?

I'm completely new to Python (and this website) and am currently trying to replace NA values in specific dataframe columns with their mode.我对 Python(和本网站)完全陌生,目前正在尝试用其模式替换特定 dataframe 列中的 NA 值。 I've tried various methods which are not working.我尝试了各种无效的方法。 Please help me spot what I'm doing incorrectly:请帮助我发现我做错了什么:

Note: All the columns I'm working with are float64 types.注意:我使用的所有列都是float64类型。 All my codes run but when I check the null amount with df[cols_mode].isnull().sum() in the columns, it remains the same.我所有的代码都运行但是当我在列中使用df[cols_mode].isnull().sum()检查 null 数量时,它保持不变。

Method 1:方法一:

cols_mode = ['race', 'goal', 'date', 'go_out', 'career_c']

df[cols_mode].apply(lambda x: x.fillna(x.mode, inplace=True))

I tried the Imputer method too but encountered the same result我也尝试了 Imputer 方法但遇到了相同的结果

Method 2:方法二:

for column in df[['race', 'goal', 'date', 'go_out', 'career_c']]:
    mode = df[column].mode()
    df[column] = df[column].fillna(mode)

Method 3:方法三:

df['race'].fillna(df.race.mode(), inplace=True)
df['goal'].fillna(df.goal.mode(), inplace=True)
df['date'].fillna(df.date.mode(), inplace=True)
df['go_out'].fillna(df.go_out.mode(), inplace=True)
df['career_c'].fillna(df.career_c.mode(), inplace=True)

Method 4: My methods become more and more of a manual process and finally this one works:方法 4:我的方法越来越像一个手动过程,最后这个方法起作用了:

df['race'].fillna(2.0, inplace=True)
df['goal'].fillna(1.0, inplace=True)
df['date'].fillna(6.0, inplace=True)
df['go_out'].fillna(2.0, inplace=True)
df['career_c'].fillna(2.0, inplace=True) 

mode returns a Series, so you still need to access the row you want before replacing NaN values in your DataFrame. mode返回一个系列,因此在替换 DataFrame 中的NaN值之前,您仍然需要访问所需的行。

for column in ['race', 'goal', 'date', 'go_out', 'career_c']:
    df[column].fillna(df[column].mode()[0], inplace=True)

If you want to apply it to all the columns of the DataFrame, then:如果要将其应用于 DataFrame 的所有列,则:

for column in df.columns:
    df[column].fillna(df[column].mode()[0], inplace=True)

Alternatively I used another data frame only containing the Modes of the columns, however you need to make sure that NaN is not the Mode of any of the columns或者,我使用了另一个仅包含列模式的数据框,但是您需要确保 NaN 不是任何列的模式

 #Create the Mode Data frame 
    df_mode=df.mode()
#simply using a forloop with object 
    for x in df.columns.values:
        df[x]=df[x].fillna(value=df_mode[x].iloc[0])

You can also use in place method.您也可以使用就地方法。 This was useful while working in large data sets I had simply created a data frame with all mean mode median for all the columns.这在处理大型数据集时很有用,我只是创建了一个数据框,所有列的所有均值模式中值。

Why not use a dictionary for your columns and pass that through instead?为什么不为您的列使用字典并通过它来代替?

dic = {'race': 2.0, 'goal': 1.0, 'date': 6.0, 'go_out': 2.0, 'career_c': 2.0}
df.fillna(value=dic)

For a single column imputation对于单列插补

df['col'] = df['col'].fillna(df['col'].mode()[0])

if you want to apply the same to a list of columns then loop over it.如果你想将相同的应用到列列表然后循环它。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 Python DataFrame 中将 0 替换为“NA”并将所有其他值替换为 0? - How to replace 0 with "NA" and all other values to 0 in Python DataFrame? 为什么我无法根据另一列中的值替换 dataframe 的一列中的这些 NA 值? [Python] - Why am I unable to replace these NA values in one column of my dataframe based on the values in another? [Python] Python如何在数据框中替换列的值 - Python how to replace a column's values in dataframe Python pandas 用模式(同一列 -A)相对于 Pandas 数据帧中的另一列替换一列(A)的 NaN 值 - Python pandas replace NaN values of one column(A) by mode (of same column -A) with respect to another column in pandas dataframe 如何用 NA 替换列中的字符串值并将列转换为浮点数? - How to replace string values from column with NA and turn column into float? Python:如何通过将 Dataframe 中的 2 列与循环相乘来仅替换列中的 0 值? - Python: How to replace only 0 values in a column by multiplication of 2 columns in Dataframe with a loop? 如何替换 python dataframe 中特定行的多个列值? - How to replace multiple column values for specific row in python dataframe? 如何用单独的字典值替换数据框列-python - How to replace dataframe column with separate dict values - python 替换python数据框值并存储在另一列中 - Replace python dataframe values and store in another column 替换列 dataframe python 中的数值 - replace numeric values in column dataframe python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM