简体   繁体   中英

Adding Additional rows to pandas dataframe to capture residual value while retaining the top 2 for each

I have a pandas dataframe as follows:

df = pd.DataFrame({
'State':['am','am','am','am','am','am','am','am','am','fg','fg','fg','fg','fg','fg','fg'],
'PC':['A','A','A','A','B','B','B','B','B','C','C','C','D','D','D','D'],
'Party':['alpha','beta','delta','yellow','alpha','beta','blue','pink','gamma','alpha','beta','kappa','alpha','gamma','kappa','lambda'],
'Votes':[10,15,50,5,11,2,5,4,60,3,1,70,12,34,52,43]
})

I want to add a Total column, which will contain the sums of the votes for each PC. Note that the PC can have the same name (eg 'A' above in two different states 'am' and 'fg', so we want to sum them separately, since they are different pc). This I do as follows

df['Total'] = df.groupby(['State','PC']).Votes.transform('sum')

After that I want to retain only the top two 'Party' by 'Vote' for each combination of 'State' and 'PC', except when the top two does not include 'beta'. In that case I want a third row for 'beta'. And, then I want to capture any remaining 'Vote' count in a new row with 'Party' as 'REST' as needed.

In sum I want the output as follows:

df_out = pd.DataFrame({
'State':['am','am','am','am','am','am','am','fg','fg','fg','fg','fg','fg'],
'PC':['A','A','A','B','B','B','B','C','C','C','A','A','A'],
'Party':['delta','beta','REST','gamma','alpha','REST','beta','kappa','alpha','beta','kappa','lambda','REST'],
'Votes':[50,15,15,60,11,9,2,70,3,1,52,43,46],
'Total':[80,80,80,82,82,82,82,74,74,74,141,141,141]
})

How do I do this?

Here is one way using groupby head , and combine others with groupby + agg , then concat back , here if the first two do not include beta, I am adding that row back s1

s1=df.sort_values('Votes').groupby(['PC','State']).tail(2)
s2=df[~df.index.isin(s1.index)]
s1=pd.concat([s1,s2.loc[s2.Party=='beta']])
s2=s2[~s2.index.isin(s1.index)].groupby(['PC','State']).agg({'Votes':'sum','Total':'first'}).assign(Party='REST')
yourdf=pd.concat([s1,s2.reset_index()],sort=True).sort_values(['PC','State'])
yourdf
Out[517]: 
   PC   Party State  Total  Votes
1   A    beta    am     80     15
2   A   delta    am     80     50
0   A    REST    am     80     15
4   B   alpha    am     82     11
8   B   gamma    am     82     60
5   B    beta    am     82      2
1   B    REST    am     82      9
9   C   alpha    fg     74      3
11  C   kappa    fg     74     70
10  C    beta    fg     74      1
15  D  lambda    fg    141     43
14  D   kappa    fg    141     52
2   D    REST    fg    141     46

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM