简体   繁体   中英

Why does pandas fillna() inplace does not work for multiple columns?

I am using this data set: Titanic pasengers I am trying to fill in missing categorical data but the fillna() with the inplace option does not do anything:

import pandas as pd

data = pd.read_csv('https://www.openml.org/data/get_csv/16826755/phpMYEkMl')

# replace question marks with np.nan
data = data.replace('?', np.nan)

var_categor = ['sex', 'cabin', 'embarked' ] 

data.loc[:, var_categor].fillna("Missing", inplace=True)

I get the same number of nan values:

data[var_categor].isnull().sum()

I get no error messages, no warnings, it just doesnt do anything. Is this normal behavior? Shouldn't it give a warning?

Try to chain operations and return a copy of values rather than modify inplace :

data[var_categor] = data.replace('?', np.nan)[var_categor].fillna('Missing')
>>> data[var_categor].isna().sum()
sex         0
cabin       0
embarked    0
dtype: int64

It's likely an issue with getting a view/slice/copy of the dataframe, and setting things in-place on that object.

The trivial fix is to not use inplace of course:

data[var_categor] = data[var_categor].fillna("Missing")

An alternate way is to use .fillna directly on the object. Here if you want to limit which columns are filled, a dictionary mapping columns to replacement values can be used:

>>> data.fillna({var: 'Missing' for var in var_categor}, inplace=True)
>>> data[var_categor].isna().sum()
sex         0
cabin       0
embarked    0
dtype: int64

However best practice in pandas is to avoid inplace , see the github issue that discusses deprecating it for more detail.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM