简体   繁体   中英

I cannot get Fillna in Python to Work when using Mode to Replace NaNs with Most Frequent Column String Value

Strange problem.

I have a dtype == object dataframe column with string values and NaNs. Looks like this:

df   
     Response    
0    Email
1    NaN
2    NaN
3    Call
4    Email
5    Email

I want to use fillna to fill the NaN values with the most frequently occurring value - which in this case is 'email'.

code looks like this:

import numpy as np
import pandas as pd

most_frequent_cat = str(df['Response']).mode())
df['Response_imputed'] = df['Response']
df['Response_imputed'].fillna(most_freq_cat, inplace = True)

The results look like this:

df   Response    

0    Email
1    0    Email\ndtype: object
2    0    Email\ndtype: object
3    Call
4    Email
5    Email

0 Email\ndtype: object is different than Email

If I remove the str there is no replacement of the original NaN s

What am I doing wrong?

Don't use DataFrame.fillna with inplace=True . Actually I would recommend forgetting that argument exists entirely . Use Series.fillna instead since you only need this on one column and assign the result back.

Another thing to note is mode can return multiple modes if there is no single mode. In that case it should suffice to either select the first one, or one at random (an exercise for you).

Here's my recommended syntax:

# call fillna on the column and assign it back
df['Response'] = df['Response'].fillna(df['Response'].mode().iat[0])
df
 
  Response
0    Email
1    Email
2    Email
3     Call
4    Email
5    Email

You can also do a per column fill if you have multiple columns to fill NaNs for. Again the procedure is similar, call mode on your columns, then get the first mode for each column and use it as an argument to DataFeame.fillna this time:

df.fillna(df.mode().iloc[0])

  Response
0    Email
1    Email
2    Email
3     Call
4    Email
5    Email
import pandas as pd
d = {'Response': ['Email','NaN','NaN','Call','Email','Email']}
df = pd.DataFrame(data=d)

df['Response'].mode() 

output:

0    Email
dtype: object

Take the first line:

df['Response'].mode()[0] 

output:

'Email'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM