简体   繁体   English

pandas使用groupby和其他列值添加列条件

[英]pandas add columns conditions with groupby and on another column values

I have pandas.DataFrame called companysubset like below, but actual data is much longer. 我有pandas.DataFrame如下所示调用companysubset ,但实际数据要长得多。

        conm                       fyear    dvpayout    industry    firmycount  ipodate
46078   CAESARS ENTERTAINMENT CORP  2003    0.226813    Services    22  19891213.0
46079   CAESARS ENTERTAINMENT CORP  2004    0.226813    Services    22  19891213.0
46080   CAESARS ENTERTAINMENT CORP  2005    0.226813    Services    22  19891213.0
46091   CAESARS ENTERTAINMENT CORP  2016    0.226813    Services    22  19891213.0
114620  CAESARSTONE LTD 2010    0.487543    Manufacturing   10  20120322.0
114621  CAESARSTONE LTD 2011    0.487543    Manufacturing   10  20120322.0
114622  CAESARSTONE LTD 2012    0.487543    Manufacturing   10  20120322.0
114623  CAESARSTONE LTD 2013    0.487543    Manufacturing   10  20120322.0
114624  CAESARSTONE LTD 2014    0.487543    Manufacturing   10  20120322.0
114625  CAESARSTONE LTD 2015    0.487543    Manufacturing   10  20120322.0
114626  CAESARSTONE LTD 2016    0.487543    Manufacturing   10  20120322.0
132524  CAFEPRESS INC   2010    0.000000    Retail Trade    7   20120329.0
132525  CAFEPRESS INC   2011    0.000000    Retail Trade    7   20120329.0
132526  CAFEPRESS INC   2012    -0.000000   Retail Trade    7   20120329.0
132527  CAFEPRESS INC   2013    -0.000000   Retail Trade    7   20120329.0
132528  CAFEPRESS INC   2014    -0.000000   Retail Trade    7   20120329.0
132529  CAFEPRESS INC   2015    -0.000000   Retail Trade    7   20120329.0
132530  CAFEPRESS INC   2016    -0.000000   Retail Trade    7   20120329.0
120049  CAI INTERNATIONAL INC   2005    0.000000    Services    12  20070516.0
120050  CAI INTERNATIONAL INC   2006    0.000000    Services    12  20070516.0
3896    CALAMP CORP 1999    -0.000000   Manufacturing   23  NaN
3897    CALAMP CORP 2000    0.000000    Manufacturing   23  NaN
3898    CALAMP CORP 2001    0.000000    Manufacturing   23  NaN
3899    CALAMP CORP 2002    0.000000    Manufacturing   23  NaN
21120   CALATLANTIC GROUP INC   1995    -0.133648   Construction    22  NaN
21121   CALATLANTIC GROUP INC   1996    -0.133648   Construction    22  NaN
21122   CALATLANTIC GROUP INC   1997    -0.133648   Construction    22  NaN
21123   CALATLANTIC GROUP INC   1998    -0.133648   Construction    22  NaN
21124   CALATLANTIC GROUP INC   1999    -0.133648   Construction    22  NaN
21125   CALATLANTIC GROUP INC   2000    -0.133648   Construction    22  NaN
21126   CALATLANTIC GROUP INC   2001    -0.133648   Construction    22  NaN
21127   CALATLANTIC GROUP INC   2002    -0.133648   Construction    22  NaN
21128   CALATLANTIC GROUP INC   2003    -0.133648   Construction    22  NaN

1) I want to calculate quartile of dvpayout of company by industry and add column called dv and indicate that it is in Q1 , Q2 , Q3 or Q4 . 1)我想按行业计算公司dvpayout的四分位数并添加一个名为dv列,并指出它在Q1Q2Q3Q4

I came up with this code, but it does not work. 我提出了这个代码,但它不起作用。

pd.cut(companysubset['dvpayout'].mean(), bins=[0,25,75,100], labels=False)

2) I want to add column called age if there is an ipodate . 2)如果有ipodate我想添加名为ageipodate The value will be the largest fyear - ipodate of year. 这个价值将是最大的fyear - ipodate (ex. 2016 - 1989 for CAESARS ENTERTAINMENT COR ) (例如2016 - 1989 CAESARS ENTERTAINMENT COR

The results data frame I want to see is like below. 我想看到的结果数据框如下所示。

        conm            fyear    dvpayout   industry    firmycount  ipodate  dv   age
46078   CAESARS ...     2003    0.226813    Services    22  19891213.0   Q2  27
46079   CAESARS ...     2004    0.226813    Services    22  19891213.0   Q2  27
46080   CAESARS ...     2005    0.226813    Services    22  19891213.0   Q2  27
46091   CAESARS ...     2016    0.226813    Services    22  19891213.0   Q2  27
114620  CAESARSTONE LTD 2010    0.487543    Manufacturing   10  20120322.0  Q3  4
114621  CAESARSTONE LTD 2011    0.487543    Manufacturing   10  20120322.0  Q3  4
114622  CAESARSTONE LTD 2012    0.487543    Manufacturing   10  20120322.0  Q3  4
114623  CAESARSTONE LTD 2013    0.487543    Manufacturing   10  20120322.0  Q3  4
114624  CAESARSTONE LTD 2014    0.487543    Manufacturing   10  20120322.0  Q3  4
114625  CAESARSTONE LTD 2015    0.487543    Manufacturing   10  20120322.0  Q3  4
114626  CAESARSTONE LTD 2016    0.487543    Manufacturing   10  20120322.0  Q3  4
132524  CAFEPRESS INC   2010    0.000000    Retail Trade    7   20120329.0  Q1  4
132525  CAFEPRESS INC   2011    0.000000    Retail Trade    7   20120329.0  Q1  4
132526  CAFEPRESS INC   2012    -0.000000   Retail Trade    7   20120329.0  Q1  4
132527  CAFEPRESS INC   2013    -0.000000   Retail Trade    7   20120329.0  Q1  4
132528  CAFEPRESS INC   2014    -0.000000   Retail Trade    7   20120329.0  Q1  4
132529  CAFEPRESS INC   2015    -0.000000   Retail Trade    7   20120329.0  Q1  4
132530  CAFEPRESS INC   2016    -0.000000   Retail Trade    7   20120329.0  Q1  4
120049  CAI INTERNATIONAL INC   2006    0.000000    Services    12  20070516.0 Q1  0
120050  CAI INTERNATIONAL INC   2007    0.000000    Services    12  20070516.0 Q1  0
3896    CALAMP CORP 1999    -0.000000   Manufacturing   23  NaN   Q1  Nan  
3897    CALAMP CORP 2000    0.000000    Manufacturing   23  NaN   Q1  Nan
3898    CALAMP CORP 2001    0.000000    Manufacturing   23  NaN   Q1  Nan
3899    CALAMP CORP 2002    0.000000    Manufacturing   23  NaN   Q1  Nan
21120   CALATLANTIC GROUP INC   1995    -0.133648   Construction    22  NaN   Q1  Nan
21121   CALATLANTIC GROUP INC   1996    -0.133648   Construction    22  NaN   Q1  Nan
21122   CALATLANTIC GROUP INC   1997    -0.133648   Construction    22  NaN   Q1  Nan
21123   CALATLANTIC GROUP INC   1998    -0.133648   Construction    22  NaN   Q1  Nan
21124   CALATLANTIC GROUP INC   1999    -0.133648   Construction    22  NaN   Q1  Nan
21125   CALATLANTIC GROUP INC   2000    -0.133648   Construction    22  NaN   Q1  Nan
21126   CALATLANTIC GROUP INC   2001    -0.133648   Construction    22  NaN   Q1  Nan
21127   CALATLANTIC GROUP INC   2002    -0.133648   Construction    22  NaN  Q1  Nan
21128   CALATLANTIC GROUP INC   2003    -0.133648   Construction    22  NaN  Q1  Nan

Thanks in advance!!!! 提前致谢!!!!

The age column can be generated with: 可以使用以下内容生成年龄列:

Code

df.set_index(['conm'], inplace=True)
df['age'] = df.groupby(level=0).apply(
    lambda x: max(x.fyear) - round(x.ipodate.iloc[0]/10000-0.5))

Test Code: 测试代码:

df = pd.read_fwf(StringIO(
    u"""
        ID      conm                  fyear   ipodate
        46078   CAESARS ENTERTAINMENT 2003    19891213.0
        46079   CAESARS ENTERTAINMENT 2004    19891213.0
        46080   CAESARS ENTERTAINMENT 2005    19891213.0
        46091   CAESARS ENTERTAINMENT 2016    19891213.0
        114620  CAESARSTONE LTD       2010    20120322.0
        114621  CAESARSTONE LTD       2011    20120322.0
        114622  CAESARSTONE LTD       2012    20120322.0
        114623  CAESARSTONE LTD       2013    20120322.0
        114624  CAESARSTONE LTD       2014    20120322.0
        114625  CAESARSTONE LTD       2015    20120322.0
        114626  CAESARSTONE LTD       2016    20120322.0
        132524  CAFEPRESS INC         2010    20120329.0
        132525  CAFEPRESS INC         2011    20120329.0
        132526  CAFEPRESS INC         2012    20120329.0
        132527  CAFEPRESS INC         2013    20120329.0
        132528  CAFEPRESS INC         2014    20120329.0
        132529  CAFEPRESS INC         2015    20120329.0
        132530  CAFEPRESS INC         2016    20120329.0
        120049  CAI INTERNATIONAL INC 2005    20070516.0
        120050  CAI INTERNATIONAL INC 2006    20070516.0
        3897    CALAMP CORP           2000    NaN
        3898    CALAMP CORP           2001    NaN
        3896    CALAMP CORP           1999    NaN
        3899    CALAMP CORP           2002    NaN
        21120   CALATLANTIC GROUP INC 1995    NaN
        21121   CALATLANTIC GROUP INC 1996    NaN
        21122   CALATLANTIC GROUP INC 1997    NaN
        21123   CALATLANTIC GROUP INC 1998    NaN
        21124   CALATLANTIC GROUP INC 1999    NaN
        21125   CALATLANTIC GROUP INC 2000    NaN
        21126   CALATLANTIC GROUP INC 2001    NaN
        21127   CALATLANTIC GROUP INC 2002    NaN
        21128   CALATLANTIC GROUP INC 2003    NaN"""),
    header=1)

df.set_index(['conm'], inplace=True)
df['age'] = df.groupby(level=0).apply(
    lambda x: max(x.fyear) - round(x.ipodate.iloc[0]/10000-0.5))
print(df)

Results: 结果:

                           ID  fyear     ipodate   age
conm                                                  
CAESARS ENTERTAINMENT   46078   2003  19891213.0  27.0
CAESARS ENTERTAINMENT   46079   2004  19891213.0  27.0
CAESARS ENTERTAINMENT   46080   2005  19891213.0  27.0
CAESARS ENTERTAINMENT   46091   2016  19891213.0  27.0
CAESARSTONE LTD        114620   2010  20120322.0   4.0
CAESARSTONE LTD        114621   2011  20120322.0   4.0
CAESARSTONE LTD        114622   2012  20120322.0   4.0
CAESARSTONE LTD        114623   2013  20120322.0   4.0
CAESARSTONE LTD        114624   2014  20120322.0   4.0
CAESARSTONE LTD        114625   2015  20120322.0   4.0
CAESARSTONE LTD        114626   2016  20120322.0   4.0
CAFEPRESS INC          132524   2010  20120329.0   4.0
CAFEPRESS INC          132525   2011  20120329.0   4.0
CAFEPRESS INC          132526   2012  20120329.0   4.0
CAFEPRESS INC          132527   2013  20120329.0   4.0
CAFEPRESS INC          132528   2014  20120329.0   4.0
CAFEPRESS INC          132529   2015  20120329.0   4.0
CAFEPRESS INC          132530   2016  20120329.0   4.0
CAI INTERNATIONAL INC  120049   2005  20070516.0  -1.0
CAI INTERNATIONAL INC  120050   2006  20070516.0  -1.0
CALAMP CORP              3897   2000         NaN   NaN
CALAMP CORP              3898   2001         NaN   NaN
CALAMP CORP              3896   1999         NaN   NaN
CALAMP CORP              3899   2002         NaN   NaN
CALATLANTIC GROUP INC   21120   1995         NaN   NaN
CALATLANTIC GROUP INC   21121   1996         NaN   NaN
CALATLANTIC GROUP INC   21122   1997         NaN   NaN
CALATLANTIC GROUP INC   21123   1998         NaN   NaN
CALATLANTIC GROUP INC   21124   1999         NaN   NaN
CALATLANTIC GROUP INC   21125   2000         NaN   NaN
CALATLANTIC GROUP INC   21126   2001         NaN   NaN
CALATLANTIC GROUP INC   21127   2002         NaN   NaN
CALATLANTIC GROUP INC   21128   2003         NaN   NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas - 具有来自另一列的条件的groupby列 - Pandas - groupby columns with conditions from another column Pandas 根据 3 个不同列中的值添加带有 groupby 的新列 - Pandas add new column with groupby based on values in 3 different columns 如何在熊猫中分组并从另一列添加值并计算它们 - How to groupby in pandas and add values from another column and count them Pandas groupby:将不同的值合并到另一列中 - Pandas groupby: combine distinct values into another column 使用条件的熊猫缺失值(按其他列分组) - Pandas missing values using conditions (groupby other columns) pandas dataframe groupby 和 agg 如果条件在另一列中获取值 - pandas dataframe groupby and agg to obtain a value if conditions in another column Pandas groupby 2 coluns/conditions 然后 value_counts() 由另一列? - Pandas groupby 2 coluns/conditions then value_counts() by another column? 熊猫:创建条件列,并基于另一个df.groupby中2列的值返回值 - Pandas: create a conditional column and return a value based on the values of 2 columns in another df.groupby Python groupby - 根据其他列中的条件更改列值 - Python groupby - change column values based on conditions in other columns 如何根据 pandas 中两个不同列的条件将值从一列复制到另一列? - how to copy values from one column into another column based on conditions of two different columns in pandas?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM