[英]Pandas conditional sum rows into new column when number of rows vary
I have data in the following format: 我有以下格式的数据:
I am trying to do the following in a pandas data frame using python 3.x: 我正在尝试使用python 3.x在熊猫数据框中执行以下操作:
The number of rows can vary from 1-4 when grouping by Ticker and Year. 按股票行情和年份分组时,行数可以从1-4变化。 For example you will see for 1AL there is one row for 2014, but two for 2015. 例如,对于1AL,您会看到2014年有一行,而2015年有两行。
Ultimately, I would the result to be one row for each ticker for each year with Net_EPS and Net_DPS showing the sum of the EPS and DPS in that year respectively. 最终,我将结果设置为每年每个股票行情的一行,其中Net_EPS和Net_DPS分别显示该年的EPS和DPS的总和。
I've tried a bunch of the suggested solutions here but I'm getting stuck due to the different number of rows and indexing. 我在这里尝试了许多建议的解决方案,但是由于行数和索引数的不同,我陷入了困境。
Data format for the EPS and DPS columns is float64. EPS和DPS列的数据格式为float64。
I would really appreciate any help. 我真的很感谢您的帮助。
As you want oto groupy by ticker and year try groupby in the same order: 当你想通过股票和一年的尝试OTO groupy GROUPBY以相同的顺序:
df = pd.DataFrame({'Ticker': ['1AL']*6 + ['3PL']*7,
'Year':[2014, 2015, 2015, 2016, 2016, 2017, 2014, 2014, 2015, 2015, 2016, 2017, 2018],
'EPS': np.random.rand(13),
'DPS':np.random.rand(13)})
df
Ticker Year EPS DPS
0 1AL 2014 0.033661 0.912861
1 1AL 2015 0.865936 0.326705
2 1AL 2015 0.398157 0.404424
3 1AL 2016 0.060185 0.482212
4 1AL 2016 0.348479 0.043894
5 1AL 2017 0.745728 0.900050
6 3PL 2014 0.581675 0.701467
7 3PL 2014 0.407660 0.371662
8 3PL 2015 0.984192 0.908538
9 3PL 2015 0.702109 0.064220
10 3PL 2016 0.376621 0.004566
11 3PL 2017 0.290292 0.171509
12 3PL 2018 0.631235 0.666724
df.groupby(['Ticker', 'Year']).sum().rename(columns = {'EPS': 'Net_EPS', 'DPS':'Net_DPS'})
Net_EPS Net_DPS
Ticker Year
1AL 2014 0.033661 0.912861
2015 1.264093 0.731129
2016 0.408664 0.526106
2017 0.745728 0.900050
3PL 2014 0.989335 1.073130
2015 1.686301 0.972758
2016 0.376621 0.004566
2017 0.290292 0.171509
2018 0.631235 0.666724
If you don't want the levels try: 如果您不希望水平尝试:
df.groupby(['Ticker', 'Year'], level = 0).transform('sum').rename(columns = {'EPS': 'Net_EPS', 'DPS':'Net_DPS'})
Ticker Year Net_EPS Net_DPS
0 1AL 2014 0.033661 0.912861
1 1AL 2015 0.865936 0.326705
2 1AL 2015 0.398157 0.404424
3 1AL 2016 0.0601846 0.482212
4 1AL 2016 0.348479 0.0438939
5 1AL 2017 0.745728 0.90005
6 3PL 2014 0.581675 0.701467
7 3PL 2014 0.40766 0.371662
8 3PL 2015 0.984192 0.908538
9 3PL 2015 0.702109 0.0642203
10 3PL 2016 0.376621 0.00456638
11 3PL 2017 0.290292 0.171509
12 3PL 2018 0.631235 0.666724
Edit : I think you need this, set as_index
as False in groupby: 编辑 :我认为您需要此,在groupby中将as_index
设置为False:
df.groupby(['Ticker', 'Year'], as_index = False).sum().rename(columns = {'EPS': 'Net_EPS', 'DPS':'Net_DPS'}
Ticker Year Net_EPS Net_DPS
0 1AL 2014 0.916628 0.964412
1 1AL 2015 0.461967 1.380665
2 1AL 2016 1.024019 0.521853
3 1AL 2017 0.664347 0.763935
4 3PL 2014 0.550123 0.554489
5 3PL 2015 0.844655 1.636665
6 3PL 2016 0.924291 0.270274
7 3PL 2017 0.225108 0.860416
8 3PL 2018 0.446283 0.180444
df = df.groupby(['Ticker','Year'],as_index = False).sum()。rename(columns = {'EPS':'Net_EPS','DPS':'Net_DPS'})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.