[英]Python Pandas: Sort and group by, then sum two consecutive rows of 2nd column for a specific value of a 3rd column
I have this dataframe: 我有这个数据框:
Group Turn Name
0 G1 1 Maria
1 G1 2 Sam
2 G1 2 Sara
3 G1 3 Maria
4 G1 4 Mark
5 G1 5 Maria
6 G2 2 Maria
7 G2 1 Ahmad
8 G3 1 Maria
9 G3 2 David
I would like to group by my data based on value of column "group" and sort based on their "Turn". 我想根据“组”列的值对数据进行分组,并根据其“转弯”进行排序。 So with each group the turns are sorted. 因此,将每个组的转弯排序。
Then I would like to sum the value of column "Turn" in each group for the rows where the name is "Maria" and one row after. 然后,我想对名称为“ Maria”及其后一行的各行中的“ Turn”列的值求和。 IF Maria is the last turn in the group then the sum only Maria's turn. 如果Maria是该组中的最后一个回合,则仅是Maria的回合之和。
So the result looks like this:
Group Name Sum
G1 Maria 3
G1 Maria 7
G1 Maria 5
G2 Maria 2
G3 Maria 3
I tried group by and apply and shift but none of them gives me the final result I am looking for. 我尝试了分组,应用和轮换,但没有一个能给我最终的结果。
df = df.groupby('group').apply(lambda x: x.sort_values('Turn'))
Can somebody help me? 有人可以帮我吗?
Use: 采用:
df.set_index(['Group','Name',(df['Name'] == 'Maria').cumsum().rename('Occurance')])\
.sum(level=[0,2])\
.reset_index()\
.assign(name='Maria')\
.drop('Occurance', axis=1)
Output: 输出:
Group Turn name
0 G1 3 Maria
1 G1 7 Maria
2 G1 5 Maria
3 G2 3 Maria
4 G3 3 Maria
You can using ffill
with limit
您可以使用limit
ffill
df=df.sort_values(['Group','Turn'])
df[df.Name.where(df.Name=='Maria').groupby(df['Group']).ffill(limit=1).eq('Maria')].set_index('Group').Turn.sum(level=0)
Out[272]:
Group
G1 5
G2 3
G3 3
Name: Turn, dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.