简体   繁体   中英

How to replace NA values with mode of a DataFrame column in python?

I'm completely new to Python (and this website) and am currently trying to replace NA values in specific dataframe columns with their mode. I've tried various methods which are not working. Please help me spot what I'm doing incorrectly:

Note: All the columns I'm working with are float64 types. All my codes run but when I check the null amount with df[cols_mode].isnull().sum() in the columns, it remains the same.

Method 1:

cols_mode = ['race', 'goal', 'date', 'go_out', 'career_c']

df[cols_mode].apply(lambda x: x.fillna(x.mode, inplace=True))

I tried the Imputer method too but encountered the same result

Method 2:

for column in df[['race', 'goal', 'date', 'go_out', 'career_c']]:
    mode = df[column].mode()
    df[column] = df[column].fillna(mode)

Method 3:

df['race'].fillna(df.race.mode(), inplace=True)
df['goal'].fillna(df.goal.mode(), inplace=True)
df['date'].fillna(df.date.mode(), inplace=True)
df['go_out'].fillna(df.go_out.mode(), inplace=True)
df['career_c'].fillna(df.career_c.mode(), inplace=True)

Method 4: My methods become more and more of a manual process and finally this one works:

df['race'].fillna(2.0, inplace=True)
df['goal'].fillna(1.0, inplace=True)
df['date'].fillna(6.0, inplace=True)
df['go_out'].fillna(2.0, inplace=True)
df['career_c'].fillna(2.0, inplace=True) 

mode returns a Series, so you still need to access the row you want before replacing NaN values in your DataFrame.

for column in ['race', 'goal', 'date', 'go_out', 'career_c']:
    df[column].fillna(df[column].mode()[0], inplace=True)

If you want to apply it to all the columns of the DataFrame, then:

for column in df.columns:
    df[column].fillna(df[column].mode()[0], inplace=True)

Alternatively I used another data frame only containing the Modes of the columns, however you need to make sure that NaN is not the Mode of any of the columns

 #Create the Mode Data frame 
    df_mode=df.mode()
#simply using a forloop with object 
    for x in df.columns.values:
        df[x]=df[x].fillna(value=df_mode[x].iloc[0])

You can also use in place method. This was useful while working in large data sets I had simply created a data frame with all mean mode median for all the columns.

Why not use a dictionary for your columns and pass that through instead?

dic = {'race': 2.0, 'goal': 1.0, 'date': 6.0, 'go_out': 2.0, 'career_c': 2.0}
df.fillna(value=dic)

For a single column imputation

df['col'] = df['col'].fillna(df['col'].mode()[0])

if you want to apply the same to a list of columns then loop over it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM