[英]Adding Additional rows to pandas dataframe to capture residual value while retaining the top 2 for each
I have a pandas dataframe as follows: 我有一个pandas数据帧如下:
df = pd.DataFrame({
'State':['am','am','am','am','am','am','am','am','am','fg','fg','fg','fg','fg','fg','fg'],
'PC':['A','A','A','A','B','B','B','B','B','C','C','C','D','D','D','D'],
'Party':['alpha','beta','delta','yellow','alpha','beta','blue','pink','gamma','alpha','beta','kappa','alpha','gamma','kappa','lambda'],
'Votes':[10,15,50,5,11,2,5,4,60,3,1,70,12,34,52,43]
})
I want to add a Total column, which will contain the sums of the votes for each PC. 我想添加一个Total列,其中包含每台PC的投票总和。 Note that the PC can have the same name (eg 'A' above in two different states 'am' and 'fg', so we want to sum them separately, since they are different pc). 请注意,PC可以具有相同的名称(例如,上面的'A'在两个不同的状态'am'和'fg',因此我们要分别对它们求和,因为它们是不同的pc)。 This I do as follows 我这样做如下
df['Total'] = df.groupby(['State','PC']).Votes.transform('sum')
After that I want to retain only the top two 'Party' by 'Vote' for each combination of 'State' and 'PC', except when the top two does not include 'beta'. 在此之后,我想为“状态”和“PC”的每个组合仅保留前两个“投票”,除非前两个不包括'beta'。 In that case I want a third row for 'beta'. 在那种情况下,我想要'beta'的第三行。 And, then I want to capture any remaining 'Vote' count in a new row with 'Party' as 'REST' as needed. 然后,我想根据需要捕获任何剩余的“投票”计数,将“派对”作为“REST”。
In sum I want the output as follows: 总之,我希望输出如下:
df_out = pd.DataFrame({
'State':['am','am','am','am','am','am','am','fg','fg','fg','fg','fg','fg'],
'PC':['A','A','A','B','B','B','B','C','C','C','A','A','A'],
'Party':['delta','beta','REST','gamma','alpha','REST','beta','kappa','alpha','beta','kappa','lambda','REST'],
'Votes':[50,15,15,60,11,9,2,70,3,1,52,43,46],
'Total':[80,80,80,82,82,82,82,74,74,74,141,141,141]
})
How do I do this? 我该怎么做呢?
Here is one way using groupby
head
, and combine others with groupby
+ agg
, then concat
back , here if the first two do not include beta, I am adding that row back s1
这是使用groupby
head
一种方法,并将其他组合与groupby
+ agg
,然后concat
回来,这里如果前两个不包括beta,我将该行添加回s1
s1=df.sort_values('Votes').groupby(['PC','State']).tail(2)
s2=df[~df.index.isin(s1.index)]
s1=pd.concat([s1,s2.loc[s2.Party=='beta']])
s2=s2[~s2.index.isin(s1.index)].groupby(['PC','State']).agg({'Votes':'sum','Total':'first'}).assign(Party='REST')
yourdf=pd.concat([s1,s2.reset_index()],sort=True).sort_values(['PC','State'])
yourdf
Out[517]:
PC Party State Total Votes
1 A beta am 80 15
2 A delta am 80 50
0 A REST am 80 15
4 B alpha am 82 11
8 B gamma am 82 60
5 B beta am 82 2
1 B REST am 82 9
9 C alpha fg 74 3
11 C kappa fg 74 70
10 C beta fg 74 1
15 D lambda fg 141 43
14 D kappa fg 141 52
2 D REST fg 141 46
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.