简体   繁体   English

我可以用分组数据框中的列模式替换Nans吗?

[英]Can I replace Nans with the mode of a column in a grouped data frame?

I have some data that looks like... 我有一些看起来像...的数据

Year      Make   Model  Trim
2007     Acura  TL      Base
2010     Dodge  Avenger SXT
2009     Dodge  Caliber SXT
2008     Dodge  Caliber SXT
2008     Dodge  Avenger SXT

Trim has some missing values. Trim有一些缺失的值。 What I would like to do is something like the following: 我想做的事情如下:

  • Group by year make and model 按年份分组和型号
  • Impute Trim if there are missing valyes for that group 如果该组缺少valyes,则进行归因修剪

So for instance, I would look at all the 2007 Acura TL. 因此,例如,我将研究所有2007 Acura TL。 That might look like 可能看起来像

 Year      Make   Model Trim
2007     Acura  TL      Base
2007     Acura  TL      XLR
2007     Acura  TL      NaN
2007     Acura  TL      Base

Then, I would impute the Nan with Base (since Base is the Mode). 然后,我将用Base来估算Nan(因为Base是Mode)。 It is important to remember here that I want to do this for every group of Year, Make, and Model. 重要的是要记住,我要对Year,Make和Model的每个组执行此操作。

Use mode 使用方式

In [215]: df
Out[215]:
   Year   Make    Model  Trim
0  2007  Acura       TL  Base
1  2010  Dodge  Avenger   SXT
2  2009  Dodge  Caliber   NaN
3  2008  Dodge  Caliber   SXT
4  2008  Dodge  Avenger   SXT

In [216]: df.Trim.fillna(df.Trim.mode()[0])
Out[216]:
0    Base
1     SXT
2     SXT
3     SXT
4     SXT
Name: Trim, dtype: object

Use inplace=True to actually set 使用inplace=True实际设置

In [217]: df.Trim.fillna(df.Trim.mode()[0], inplace=True)

In [218]: df
Out[218]:
   Year   Make    Model  Trim
0  2007  Acura       TL  Base
1  2010  Dodge  Avenger   SXT
2  2009  Dodge  Caliber   SXT
3  2008  Dodge  Caliber   SXT
4  2008  Dodge  Avenger   SXT

If you're working on groups 如果您正在小组工作

In [227]: df
Out[227]:
   Year   Make Model  Trim
0  2007  Acura    TL  Base
1  2007  Acura    TL   XLR
2  2007  Acura    TL   NaN
3  2007  Acura    TL  Base

In [228]: (df.groupby(['Year', 'Make', 'Model'])['Trim']
             .apply(lambda x: x.fillna(x.mode()[0])))
     ...:
Out[228]:
0    Base
1     XLR
2    Base
3    Base
Name: Trim, dtype: object

Use groupby then mode . 使用groupby then mode Note that mode returns an array and you want to grab the first element of it. 请注意, mode返回一个数组,您想获取它的第一个元素。 @John Galt deserves credit for this and gets my upvote. @John Galt为此值得赞扬,得到我的支持。

I use assign to create a copy of df with an overwritten version of the Trim column. 我使用assign创建具有覆盖的Trim列版本的df副本。

df.assign(
    Trim=df.groupby(
        ['Year', 'Make', 'Model']
    ).Trim.apply(
        lambda x: x.fillna(x.mode()[0])
    )
)

   Year   Make Model  Trim
0  2007  Acura    TL  Base
1  2007  Acura    TL   XLR
2  2007  Acura    TL  Base
3  2007  Acura    TL  Base

You can overwrite the column directly with 您可以直接使用覆盖该列

df['Trim'] = df.groupby(
    ['Year', 'Make', 'Model']
).Trim.apply(
    lambda x: x.fillna(x.mode()[0])
)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用模式将 NaN 替换为最常见的列字符串值时,我无法让 Python 中的 Fillna 工作 - I cannot get Fillna in Python to Work when using Mode to Replace NaNs with Most Frequent Column String Value 如何用来自 Pandas 数据帧的单独 NaN 的不同值替换重复的 NaN - How to replace repeated NaNs with a different value from lone NaNs from Pandas data frame 在分组数据框上创建新列 - Create new column on grouped data frame 在 python 中,如何将数据框中的特定值替换为其列均值? - In python, how can I replace a specific value in a data frame with its column mean? 我可以在分组的数据帧上应用使用“移位”的函数,并从熊猫返回一个简单的数据帧吗? - Can I apply a function that uses 'shift' on a grouped data frame, and return a simple data frame from pandas? 如何用一系列连续数字替换 pandas 数据框中的列值? - How can I replace the column values in a pandas data frame with a sequence of consecutive numbers? 如果包含NaN的列如何转换为int? - How can I convert a column to int if it contains NaNs? 如何用数字替换熊猫数据框中的单词? - How can I replace words in a panda data frame with a number? 如何从多索引数据框创建包含 Matplotlib 或 Seaborn 的分组条形图? - How can I create a grouped bar chart with Matplotlib or Seaborn from a multi-indexed data frame? 如果列包含某个值,则替换 dataframe 中的 Nans - Replace Nans in dataframe if a column contains a certain value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM