我可以用分组数据框中的列模式替换Nans吗？

Question

I have some data that looks like... 我有一些看起来像...的数据

Year      Make   Model  Trim
2007     Acura  TL      Base
2010     Dodge  Avenger SXT
2009     Dodge  Caliber SXT
2008     Dodge  Caliber SXT
2008     Dodge  Avenger SXT

Trim has some missing values. Trim有一些缺失的值。 What I would like to do is something like the following: 我想做的事情如下：

Group by year make and model 按年份分组和型号
Impute Trim if there are missing valyes for that group 如果该组缺少valyes，则进行归因修剪

So for instance, I would look at all the 2007 Acura TL. 因此，例如，我将研究所有2007 Acura TL。 That might look like 可能看起来像

 Year      Make   Model Trim
2007     Acura  TL      Base
2007     Acura  TL      XLR
2007     Acura  TL      NaN
2007     Acura  TL      Base

Then, I would impute the Nan with Base (since Base is the Mode). 然后，我将用Base来估算Nan（因为Base是Mode）。 It is important to remember here that I want to do this for every group of Year, Make, and Model. 重要的是要记住，我要对Year，Make和Model的每个组执行此操作。

Answer 1

Use mode 使用方式

In [215]: df
Out[215]:
   Year   Make    Model  Trim
0  2007  Acura       TL  Base
1  2010  Dodge  Avenger   SXT
2  2009  Dodge  Caliber   NaN
3  2008  Dodge  Caliber   SXT
4  2008  Dodge  Avenger   SXT

In [216]: df.Trim.fillna(df.Trim.mode()[0])
Out[216]:
0    Base
1     SXT
2     SXT
3     SXT
4     SXT
Name: Trim, dtype: object

Use inplace=True to actually set 使用inplace=True实际设置

In [217]: df.Trim.fillna(df.Trim.mode()[0], inplace=True)

In [218]: df
Out[218]:
   Year   Make    Model  Trim
0  2007  Acura       TL  Base
1  2010  Dodge  Avenger   SXT
2  2009  Dodge  Caliber   SXT
3  2008  Dodge  Caliber   SXT
4  2008  Dodge  Avenger   SXT

If you're working on groups 如果您正在小组工作

In [227]: df
Out[227]:
   Year   Make Model  Trim
0  2007  Acura    TL  Base
1  2007  Acura    TL   XLR
2  2007  Acura    TL   NaN
3  2007  Acura    TL  Base

In [228]: (df.groupby(['Year', 'Make', 'Model'])['Trim']
             .apply(lambda x: x.fillna(x.mode()[0])))
     ...:
Out[228]:
0    Base
1     XLR
2    Base
3    Base
Name: Trim, dtype: object

Answer 2

Use groupby then mode . 使用groupby then mode 。 Note that mode returns an array and you want to grab the first element of it. 请注意， mode返回一个数组，您想获取它的第一个元素。 @John Galt deserves credit for this and gets my upvote. @John Galt为此值得赞扬，得到我的支持。

I use assign to create a copy of df with an overwritten version of the Trim column. 我使用assign创建具有覆盖的Trim列版本的df副本。

df.assign(
    Trim=df.groupby(
        ['Year', 'Make', 'Model']
    ).Trim.apply(
        lambda x: x.fillna(x.mode()[0])
    )
)

   Year   Make Model  Trim
0  2007  Acura    TL  Base
1  2007  Acura    TL   XLR
2  2007  Acura    TL  Base
3  2007  Acura    TL  Base

You can overwrite the column directly with 您可以直接使用覆盖该列

df['Trim'] = df.groupby(
    ['Year', 'Make', 'Model']
).Trim.apply(
    lambda x: x.fillna(x.mode()[0])
)

我可以用分组数据框中的列模式替换Nans吗？

问题描述

2 个解决方案

解决方案1
3 2017-08-17 17:52:45

解决方案2
1 已采纳 2017-08-17 18:05:03

我可以用分组数据框中的列模式替换Nans吗？

问题描述

2 个解决方案

解决方案1 3 2017-08-17 17:52:45

解决方案2 1 已采纳 2017-08-17 18:05:03

解决方案1
3 2017-08-17 17:52:45

解决方案2
1 已采纳 2017-08-17 18:05:03