将附加行添加到pandas数据帧以捕获剩余值，同时保留每个行的前2个

Question

I have a pandas dataframe as follows: 我有一个pandas数据帧如下：

df = pd.DataFrame({
'State':['am','am','am','am','am','am','am','am','am','fg','fg','fg','fg','fg','fg','fg'],
'PC':['A','A','A','A','B','B','B','B','B','C','C','C','D','D','D','D'],
'Party':['alpha','beta','delta','yellow','alpha','beta','blue','pink','gamma','alpha','beta','kappa','alpha','gamma','kappa','lambda'],
'Votes':[10,15,50,5,11,2,5,4,60,3,1,70,12,34,52,43]
})

I want to add a Total column, which will contain the sums of the votes for each PC. 我想添加一个Total列，其中包含每台PC的投票总和。 Note that the PC can have the same name (eg 'A' above in two different states 'am' and 'fg', so we want to sum them separately, since they are different pc). 请注意，PC可以具有相同的名称（例如，上面的'A'在两个不同的状态'am'和'fg'，因此我们要分别对它们求和，因为它们是不同的pc）。 This I do as follows 我这样做如下

df['Total'] = df.groupby(['State','PC']).Votes.transform('sum')

After that I want to retain only the top two 'Party' by 'Vote' for each combination of 'State' and 'PC', except when the top two does not include 'beta'. 在此之后，我想为“状态”和“PC”的每个组合仅保留前两个“投票”，除非前两个不包括'beta'。 In that case I want a third row for 'beta'. 在那种情况下，我想要'beta'的第三行。 And, then I want to capture any remaining 'Vote' count in a new row with 'Party' as 'REST' as needed. 然后，我想根据需要捕获任何剩余的“投票”计数，将“派对”作为“REST”。

In sum I want the output as follows: 总之，我希望输出如下：

df_out = pd.DataFrame({
'State':['am','am','am','am','am','am','am','fg','fg','fg','fg','fg','fg'],
'PC':['A','A','A','B','B','B','B','C','C','C','A','A','A'],
'Party':['delta','beta','REST','gamma','alpha','REST','beta','kappa','alpha','beta','kappa','lambda','REST'],
'Votes':[50,15,15,60,11,9,2,70,3,1,52,43,46],
'Total':[80,80,80,82,82,82,82,74,74,74,141,141,141]
})

How do I do this? 我该怎么做呢？

Answer 1

Here is one way using groupby head , and combine others with groupby + agg , then concat back , here if the first two do not include beta, I am adding that row back s1 这是使用groupby head一种方法，并将其他组合与groupby + agg ，然后concat回来，这里如果前两个不包括beta，我将该行添加回s1

s1=df.sort_values('Votes').groupby(['PC','State']).tail(2)
s2=df[~df.index.isin(s1.index)]
s1=pd.concat([s1,s2.loc[s2.Party=='beta']])
s2=s2[~s2.index.isin(s1.index)].groupby(['PC','State']).agg({'Votes':'sum','Total':'first'}).assign(Party='REST')
yourdf=pd.concat([s1,s2.reset_index()],sort=True).sort_values(['PC','State'])
yourdf
Out[517]: 
   PC   Party State  Total  Votes
1   A    beta    am     80     15
2   A   delta    am     80     50
0   A    REST    am     80     15
4   B   alpha    am     82     11
8   B   gamma    am     82     60
5   B    beta    am     82      2
1   B    REST    am     82      9
9   C   alpha    fg     74      3
11  C   kappa    fg     74     70
10  C    beta    fg     74      1
15  D  lambda    fg    141     43
14  D   kappa    fg    141     52
2   D    REST    fg    141     46

将附加行添加到pandas数据帧以捕获剩余值，同时保留每个行的前2个

问题描述

1 个解决方案

解决方案1
3 已采纳 2019-04-01 01:55:08

将附加行添加到pandas数据帧以捕获剩余值，同时保留每个行的前2个

问题描述

1 个解决方案

解决方案1 3 已采纳 2019-04-01 01:55:08

解决方案1
3 已采纳 2019-04-01 01:55:08