简体   繁体   中英

Pandas conditional sum rows into new column when number of rows vary

I have data in the following format:

在此处输入图片说明

I am trying to do the following in a pandas data frame using python 3.x:

  1. Group rows by Ticker and Year and sum the figures from the DPS column into a new column called Net_DPS.
  2. Group rows by Ticker and Year and sum the figures from the EPS column into a new column called Net_EPS.

The number of rows can vary from 1-4 when grouping by Ticker and Year. For example you will see for 1AL there is one row for 2014, but two for 2015.

Ultimately, I would the result to be one row for each ticker for each year with Net_EPS and Net_DPS showing the sum of the EPS and DPS in that year respectively.

I've tried a bunch of the suggested solutions here but I'm getting stuck due to the different number of rows and indexing.

Data format for the EPS and DPS columns is float64.

I would really appreciate any help.

As you want oto groupy by ticker and year try groupby in the same order:

df = pd.DataFrame({'Ticker': ['1AL']*6 + ['3PL']*7,
                  'Year':[2014, 2015, 2015, 2016, 2016, 2017, 2014, 2014, 2015, 2015, 2016, 2017, 2018],
                  'EPS': np.random.rand(13),
                  'DPS':np.random.rand(13)})
df

    Ticker  Year    EPS           DPS
0   1AL     2014    0.033661    0.912861
1   1AL     2015    0.865936    0.326705
2   1AL     2015    0.398157    0.404424
3   1AL     2016    0.060185    0.482212
4   1AL     2016    0.348479    0.043894
5   1AL     2017    0.745728    0.900050
6   3PL     2014    0.581675    0.701467
7   3PL     2014    0.407660    0.371662
8   3PL     2015    0.984192    0.908538
9   3PL     2015    0.702109    0.064220
10  3PL     2016    0.376621    0.004566
11  3PL     2017    0.290292    0.171509
12  3PL     2018    0.631235    0.666724

df.groupby(['Ticker', 'Year']).sum().rename(columns = {'EPS': 'Net_EPS', 'DPS':'Net_DPS'})


                 Net_EPS    Net_DPS
Ticker  Year        
1AL     2014    0.033661    0.912861
        2015    1.264093    0.731129
        2016    0.408664    0.526106
        2017    0.745728    0.900050
3PL     2014    0.989335    1.073130
        2015    1.686301    0.972758
        2016    0.376621    0.004566
        2017    0.290292    0.171509
        2018    0.631235    0.666724

If you don't want the levels try:

df.groupby(['Ticker', 'Year'], level = 0).transform('sum').rename(columns = {'EPS': 'Net_EPS', 'DPS':'Net_DPS'})

    Ticker  Year    Net_EPS Net_DPS
0   1AL 2014    0.033661    0.912861
1   1AL 2015    0.865936    0.326705
2   1AL 2015    0.398157    0.404424
3   1AL 2016    0.0601846   0.482212
4   1AL 2016    0.348479    0.0438939
5   1AL 2017    0.745728    0.90005
6   3PL 2014    0.581675    0.701467
7   3PL 2014    0.40766 0.371662
8   3PL 2015    0.984192    0.908538
9   3PL 2015    0.702109    0.0642203
10  3PL 2016    0.376621    0.00456638
11  3PL 2017    0.290292    0.171509
12  3PL 2018    0.631235    0.666724

Edit : I think you need this, set as_index as False in groupby:

df.groupby(['Ticker', 'Year'], as_index = False).sum().rename(columns = {'EPS': 'Net_EPS', 'DPS':'Net_DPS'}

    Ticker  Year    Net_EPS     Net_DPS
0   1AL     2014    0.916628    0.964412
1   1AL     2015    0.461967    1.380665
2   1AL     2016    1.024019    0.521853
3   1AL     2017    0.664347    0.763935
4   3PL     2014    0.550123    0.554489
5   3PL     2015    0.844655    1.636665
6   3PL     2016    0.924291    0.270274
7   3PL     2017    0.225108    0.860416
8   3PL     2018    0.446283    0.180444

df = df.groupby(['Ticker','Year'],as_index = False).sum()。rename(columns = {'EPS':'Net_EPS','DPS':'Net_DPS'})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM