简体   繁体   中英

Fillna() not imputing values with respect to groupby()

I'm trying to use fillna() and transform() to impute some missing values in a column with respect to the 'release_year' and 'brand_name' of the phone, but after running my code I still have the same missing value counts.

Here are my missing value counts & percentages prior to running the code:

我要估算的列是'main_camera_mp

Here is the code I ran to impute 'main_camera_mp' and the result (just an FYI that I copied the above dataframe into df2):

df2['main_camera_mp'] = df2['main_camera_mp'].fillna(value = df2.groupby(['release_year','brand_name'])['main_camera_mp'].transform('mean'))

运行上述行后缺失值计数和百分比

I guess your imputation method is not suited for your data, in that when main_camera_mp is missing, it is missing for all entries in that release_year - brand_name group. Thus the series derived from the groupby object that you pass as the fill value will itself have missing values for those groups.

Here is a simple example of how this can happen:

import numpy as np
import pandas as pd

df2 = pd.DataFrame({'main_camera_mp': [1, 2, 3, np.nan, 5, 6, np.nan, np.nan],
                    'release_year': [2000, 2000, 2001, 2001, 2000, 2000, 2001, 2001],
                    'brand_name': ['a', 'b', 'a', 'b', 'a', 'b', 'a', 'b']})

df2['main_camera_mp'] = df2['main_camera_mp'].fillna(value = 
    df2.groupby(['release_year', 'brand_name'])['main_camera_mp'].transform('mean'))
df2
    main_camera_mp  release_year    brand_name
0   1.0             2000            a
1   2.0             2000            b
2   3.0             2001            a
3   NaN             2001            b
4   5.0             2000            a
5   6.0             2000            b
6   3.0             2001            a
7   NaN             2001            b

Note that the value at index 6 was imputed as intended, but the other two missing values were not, because there is no non-missing value for their group.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM