简体   繁体   English

当行数变化时,熊猫将有条件的总和行转换为新列

[英]Pandas conditional sum rows into new column when number of rows vary

I have data in the following format: 我有以下格式的数据:

在此处输入图片说明

I am trying to do the following in a pandas data frame using python 3.x: 我正在尝试使用python 3.x在熊猫数据框中执行以下操作:

  1. Group rows by Ticker and Year and sum the figures from the DPS column into a new column called Net_DPS. 通过行情指示器和年份对行进行分组,并将DPS列中的数字求和到一个新列Net_DPS。
  2. Group rows by Ticker and Year and sum the figures from the EPS column into a new column called Net_EPS. 将行按Ticker和Year分组,并将EPS列中的数字求和到一个新列Net_EPS中。

The number of rows can vary from 1-4 when grouping by Ticker and Year. 按股票行情和年份分组时,行数可以从1-4变化。 For example you will see for 1AL there is one row for 2014, but two for 2015. 例如,对于1AL,您会看到2014年有一行,而2015年有两行。

Ultimately, I would the result to be one row for each ticker for each year with Net_EPS and Net_DPS showing the sum of the EPS and DPS in that year respectively. 最终,我将结果设置为每年每个股票行情的一行,其中Net_EPS和Net_DPS分别显示该年的EPS和DPS的总和。

I've tried a bunch of the suggested solutions here but I'm getting stuck due to the different number of rows and indexing. 我在这里尝试了许多建议的解决方案,但是由于行数和索引数的不同,我陷入了困境。

Data format for the EPS and DPS columns is float64. EPS和DPS列的数据格式为float64。

I would really appreciate any help. 我真的很感谢您的帮助。

As you want oto groupy by ticker and year try groupby in the same order: 当你想通过股票和一年的尝试OTO groupy GROUPBY以相同的顺序:

df = pd.DataFrame({'Ticker': ['1AL']*6 + ['3PL']*7,
                  'Year':[2014, 2015, 2015, 2016, 2016, 2017, 2014, 2014, 2015, 2015, 2016, 2017, 2018],
                  'EPS': np.random.rand(13),
                  'DPS':np.random.rand(13)})
df

    Ticker  Year    EPS           DPS
0   1AL     2014    0.033661    0.912861
1   1AL     2015    0.865936    0.326705
2   1AL     2015    0.398157    0.404424
3   1AL     2016    0.060185    0.482212
4   1AL     2016    0.348479    0.043894
5   1AL     2017    0.745728    0.900050
6   3PL     2014    0.581675    0.701467
7   3PL     2014    0.407660    0.371662
8   3PL     2015    0.984192    0.908538
9   3PL     2015    0.702109    0.064220
10  3PL     2016    0.376621    0.004566
11  3PL     2017    0.290292    0.171509
12  3PL     2018    0.631235    0.666724

df.groupby(['Ticker', 'Year']).sum().rename(columns = {'EPS': 'Net_EPS', 'DPS':'Net_DPS'})


                 Net_EPS    Net_DPS
Ticker  Year        
1AL     2014    0.033661    0.912861
        2015    1.264093    0.731129
        2016    0.408664    0.526106
        2017    0.745728    0.900050
3PL     2014    0.989335    1.073130
        2015    1.686301    0.972758
        2016    0.376621    0.004566
        2017    0.290292    0.171509
        2018    0.631235    0.666724

If you don't want the levels try: 如果您不希望水平尝试:

df.groupby(['Ticker', 'Year'], level = 0).transform('sum').rename(columns = {'EPS': 'Net_EPS', 'DPS':'Net_DPS'})

    Ticker  Year    Net_EPS Net_DPS
0   1AL 2014    0.033661    0.912861
1   1AL 2015    0.865936    0.326705
2   1AL 2015    0.398157    0.404424
3   1AL 2016    0.0601846   0.482212
4   1AL 2016    0.348479    0.0438939
5   1AL 2017    0.745728    0.90005
6   3PL 2014    0.581675    0.701467
7   3PL 2014    0.40766 0.371662
8   3PL 2015    0.984192    0.908538
9   3PL 2015    0.702109    0.0642203
10  3PL 2016    0.376621    0.00456638
11  3PL 2017    0.290292    0.171509
12  3PL 2018    0.631235    0.666724

Edit : I think you need this, set as_index as False in groupby: 编辑 :我认为您需要此,在groupby中将as_index设置为False:

df.groupby(['Ticker', 'Year'], as_index = False).sum().rename(columns = {'EPS': 'Net_EPS', 'DPS':'Net_DPS'}

    Ticker  Year    Net_EPS     Net_DPS
0   1AL     2014    0.916628    0.964412
1   1AL     2015    0.461967    1.380665
2   1AL     2016    1.024019    0.521853
3   1AL     2017    0.664347    0.763935
4   3PL     2014    0.550123    0.554489
5   3PL     2015    0.844655    1.636665
6   3PL     2016    0.924291    0.270274
7   3PL     2017    0.225108    0.860416
8   3PL     2018    0.446283    0.180444

df = df.groupby(['Ticker','Year'],as_index = False).sum()。rename(columns = {'EPS':'Net_EPS','DPS':'Net_DPS'})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM