[英]Can I replace Nans with the mode of a column in a grouped data frame?
I have some data that looks like... 我有一些看起来像...的数据
Year Make Model Trim
2007 Acura TL Base
2010 Dodge Avenger SXT
2009 Dodge Caliber SXT
2008 Dodge Caliber SXT
2008 Dodge Avenger SXT
Trim
has some missing values. Trim
有一些缺失的值。 What I would like to do is something like the following: 我想做的事情如下:
So for instance, I would look at all the 2007 Acura TL. 因此,例如,我将研究所有2007 Acura TL。 That might look like
可能看起来像
Year Make Model Trim
2007 Acura TL Base
2007 Acura TL XLR
2007 Acura TL NaN
2007 Acura TL Base
Then, I would impute the Nan with Base (since Base is the Mode). 然后,我将用Base来估算Nan(因为Base是Mode)。 It is important to remember here that I want to do this for every group of Year, Make, and Model.
重要的是要记住,我要对Year,Make和Model的每个组执行此操作。
Use mode 使用方式
In [215]: df
Out[215]:
Year Make Model Trim
0 2007 Acura TL Base
1 2010 Dodge Avenger SXT
2 2009 Dodge Caliber NaN
3 2008 Dodge Caliber SXT
4 2008 Dodge Avenger SXT
In [216]: df.Trim.fillna(df.Trim.mode()[0])
Out[216]:
0 Base
1 SXT
2 SXT
3 SXT
4 SXT
Name: Trim, dtype: object
Use inplace=True
to actually set 使用
inplace=True
实际设置
In [217]: df.Trim.fillna(df.Trim.mode()[0], inplace=True)
In [218]: df
Out[218]:
Year Make Model Trim
0 2007 Acura TL Base
1 2010 Dodge Avenger SXT
2 2009 Dodge Caliber SXT
3 2008 Dodge Caliber SXT
4 2008 Dodge Avenger SXT
If you're working on groups 如果您正在小组工作
In [227]: df
Out[227]:
Year Make Model Trim
0 2007 Acura TL Base
1 2007 Acura TL XLR
2 2007 Acura TL NaN
3 2007 Acura TL Base
In [228]: (df.groupby(['Year', 'Make', 'Model'])['Trim']
.apply(lambda x: x.fillna(x.mode()[0])))
...:
Out[228]:
0 Base
1 XLR
2 Base
3 Base
Name: Trim, dtype: object
Use groupby
then mode
. 使用
groupby
then mode
。 Note that mode
returns an array and you want to grab the first element of it. 请注意,
mode
返回一个数组,您想获取它的第一个元素。 @John Galt deserves credit for this and gets my upvote. @John Galt为此值得赞扬,得到我的支持。
I use assign
to create a copy of df
with an overwritten version of the Trim
column. 我使用
assign
创建具有覆盖的Trim
列版本的df
副本。
df.assign(
Trim=df.groupby(
['Year', 'Make', 'Model']
).Trim.apply(
lambda x: x.fillna(x.mode()[0])
)
)
Year Make Model Trim
0 2007 Acura TL Base
1 2007 Acura TL XLR
2 2007 Acura TL Base
3 2007 Acura TL Base
You can overwrite the column directly with 您可以直接使用覆盖该列
df['Trim'] = df.groupby(
['Year', 'Make', 'Model']
).Trim.apply(
lambda x: x.fillna(x.mode()[0])
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.