简体   繁体   English

多列的 Pandas Fillna 与每列的模式

[英]Pandas Fillna of Multiple Columns with Mode of Each Column

Working with census data, I want to replace NaNs in two columns ("workclass" and "native-country") with the respective modes of those two columns.使用人口普查数据,我想用这两列的各自模式替换两列(“workclass”和“native-country”)中的 NaN。 I can get the modes easily:我可以轻松获得模式:

mode = df.filter(["workclass", "native-country"]).mode()

which returns a dataframe:它返回一个数据帧:

  workclass native-country
0   Private  United-States

However,然而,

df.filter(["workclass", "native-country"]).fillna(mode)

does not replace the NaNs in each column with anything, let alone the mode corresponding to that column.替换任何物体每列的NaN的,更不用说对应于该列的模式。 Is there a smooth way to do this?有没有一种平稳的方法来做到这一点?

If you want to impute missing values with the mode in some columns a dataframe df , you can just fillna by Series created by select by position by iloc :如果您想将某些列中的mode fillna数据fillna df缺失值,您可以通过iloc按位置选择创建的Series iloc

cols = ["workclass", "native-country"]
df[cols]=df[cols].fillna(df.mode().iloc[0])

Or:或者:

df[cols]=df[cols].fillna(mode.iloc[0])

Your solution:您的解决方案:

df[cols]=df.filter(cols).fillna(mode.iloc[0])

Sample:样本:

df = pd.DataFrame({'workclass':['Private','Private',np.nan, 'another', np.nan],
                   'native-country':['United-States',np.nan,'Canada',np.nan,'United-States'],
                   'col':[2,3,7,8,9]})

print (df)
   col native-country workclass
0    2  United-States   Private
1    3            NaN   Private
2    7         Canada       NaN
3    8            NaN   another
4    9  United-States       NaN

mode = df.filter(["workclass", "native-country"]).mode()
print (mode)
  workclass native-country
0   Private  United-States

cols = ["workclass", "native-country"]
df[cols]=df[cols].fillna(df.mode().iloc[0])
print (df)
   col native-country workclass
0    2  United-States   Private
1    3  United-States   Private
2    7         Canada   Private
3    8  United-States   another
4    9  United-States   Private

You can do it like that:你可以这样做:

df[["workclass", "native-country"]]=df[["workclass", "native-country"]].fillna(value=mode.iloc[0])

For example,例如,

    import pandas as pd
d={
    'key3': [1,4,4,4,5],
    'key2': [6,6,4],
    'key1': [6,4,4],
}

df=pd.DataFrame.from_dict(d,orient='index').transpose()

Then df is那么df

  key3  key2    key1
0   1   6       6
1   4   6       4
2   4   4       4
3   4   NaN     NaN
4   5   NaN     NaN

Then by doing:然后通过做:

l=df.filter(["key1", "key2"]).mode()
df[["key1", "key2"]]=df[["key1", "key2"]].fillna(value=l.iloc[0])

we get that df is我们知道df

  key3  key2    key1
0   1   6        6
1   4   6        4
2   4   4        4
3   4   6        4
4   5   6        4

I think it's cleanest to use a dict as the fillna parameter 'value'我认为使用 dict 作为填充参数“值”是最干净的

ref: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html参考: https : //pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html

create a toy df from @miriam-farber's response根据@miriam-farber 的回复创建一个玩具 df

import pandas as pd
d={
    'key3': [1,4,4,4,5],
    'key2': [6,6,4],
    'key1': [6,4,4],
}

d_df=pd.DataFrame.from_dict(d,orient='index').transpose()

create a dict创建一个字典

mode_dict = d_df.loc[:,['key2','key1']].mode().to_dict('records')[0]

use this dict in fillna method在 fillna 方法中使用此 dict

d_df.fillna(mode_dict, inplace=True)

This code impute mean to the int columns and mode to the object columns making a list of both types of columns and imputing the missing value according to the conditions.此代码将平均值归入 int 列,将模式归入对象列,生成两种类型的列的列表,并根据条件输入缺失值。

cateogry_columns=df.select_dtypes(include=['object']).columns.tolist()
integer_columns=df.select_dtypes(include=['int64','float64']).columns.tolist()

for column in df:
    if df[column].isnull().any():
        if(column in cateogry_columns):
            df[column]=df[column].fillna(df[column].mode()[0])
        else:
            df[column]=df[column].fillna(df[column].mean)`

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM