I have data in the following format:
I am trying to do the following in a pandas data frame using python 3.x:
The number of rows can vary from 1-4 when grouping by Ticker and Year. For example you will see for 1AL there is one row for 2014, but two for 2015.
Ultimately, I would the result to be one row for each ticker for each year with Net_EPS and Net_DPS showing the sum of the EPS and DPS in that year respectively.
I've tried a bunch of the suggested solutions here but I'm getting stuck due to the different number of rows and indexing.
Data format for the EPS and DPS columns is float64.
I would really appreciate any help.
As you want oto groupy by ticker and year try groupby in the same order:
df = pd.DataFrame({'Ticker': ['1AL']*6 + ['3PL']*7,
'Year':[2014, 2015, 2015, 2016, 2016, 2017, 2014, 2014, 2015, 2015, 2016, 2017, 2018],
'EPS': np.random.rand(13),
'DPS':np.random.rand(13)})
df
Ticker Year EPS DPS
0 1AL 2014 0.033661 0.912861
1 1AL 2015 0.865936 0.326705
2 1AL 2015 0.398157 0.404424
3 1AL 2016 0.060185 0.482212
4 1AL 2016 0.348479 0.043894
5 1AL 2017 0.745728 0.900050
6 3PL 2014 0.581675 0.701467
7 3PL 2014 0.407660 0.371662
8 3PL 2015 0.984192 0.908538
9 3PL 2015 0.702109 0.064220
10 3PL 2016 0.376621 0.004566
11 3PL 2017 0.290292 0.171509
12 3PL 2018 0.631235 0.666724
df.groupby(['Ticker', 'Year']).sum().rename(columns = {'EPS': 'Net_EPS', 'DPS':'Net_DPS'})
Net_EPS Net_DPS
Ticker Year
1AL 2014 0.033661 0.912861
2015 1.264093 0.731129
2016 0.408664 0.526106
2017 0.745728 0.900050
3PL 2014 0.989335 1.073130
2015 1.686301 0.972758
2016 0.376621 0.004566
2017 0.290292 0.171509
2018 0.631235 0.666724
If you don't want the levels try:
df.groupby(['Ticker', 'Year'], level = 0).transform('sum').rename(columns = {'EPS': 'Net_EPS', 'DPS':'Net_DPS'})
Ticker Year Net_EPS Net_DPS
0 1AL 2014 0.033661 0.912861
1 1AL 2015 0.865936 0.326705
2 1AL 2015 0.398157 0.404424
3 1AL 2016 0.0601846 0.482212
4 1AL 2016 0.348479 0.0438939
5 1AL 2017 0.745728 0.90005
6 3PL 2014 0.581675 0.701467
7 3PL 2014 0.40766 0.371662
8 3PL 2015 0.984192 0.908538
9 3PL 2015 0.702109 0.0642203
10 3PL 2016 0.376621 0.00456638
11 3PL 2017 0.290292 0.171509
12 3PL 2018 0.631235 0.666724
Edit : I think you need this, set as_index
as False in groupby:
df.groupby(['Ticker', 'Year'], as_index = False).sum().rename(columns = {'EPS': 'Net_EPS', 'DPS':'Net_DPS'}
Ticker Year Net_EPS Net_DPS
0 1AL 2014 0.916628 0.964412
1 1AL 2015 0.461967 1.380665
2 1AL 2016 1.024019 0.521853
3 1AL 2017 0.664347 0.763935
4 3PL 2014 0.550123 0.554489
5 3PL 2015 0.844655 1.636665
6 3PL 2016 0.924291 0.270274
7 3PL 2017 0.225108 0.860416
8 3PL 2018 0.446283 0.180444
df = df.groupby(['Ticker','Year'],as_index = False).sum()。rename(columns = {'EPS':'Net_EPS','DPS':'Net_DPS'})
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.