[英]How to group rows based on specific value in a row and another column in pandas?
[英]Keep maximum row in consecutive rows with specific value in column based on another column value
我有一個df。
import pandas as pd
df = pd.DataFrame({'id_c':[1] * 4 + [2] * 3 + [3] * 4,
'Run':[7,8,5,4,3,2,1,2,3,4,5],
'Date_diff':[4,12,0,0,2,2,10,1,1,3,3]})
id_c Run Date_diff
1 7 4
1 8 12
1 5 0
1 4 0
2 3 2
2 2 2
2 1 10
3 2 1
3 3 1
3 4 3
3 5 3
對於 id_c 的每個唯一值,如果連續兩行的 Date_diff 等於 0、1、2,我想在 Run 中保留具有最大值的行。
我試過了:
df.groupby(['id_c' , 'Date_diff'])['Run'].idxmax()]
但它也會為不同於 0、1、2 的 Date_diff 值選擇最大值。
所需的 output 將是:
id_c Run Date_diff
1 7 4
1 8 12
1 5 0
2 3 2
2 1 10
3 3 1
3 4 3
3 5 3
謝謝!
IIUC,計算自定義組並獲取每個組的最大索引,然后切片:
# get values not in 0/1/2
mask = ~df['Date_diff'].isin([0,1,2])
# group the consecutive 0/1/2 and get id of max Run
idx = df.groupby(['id_c', (mask|mask.shift()).cumsum()])['Run'].idxmax().values
# slice output with max ids
out = df.loc[idx]
output:
id_c Run Date_diff
0 1 7 4
1 1 8 12
2 1 5 0
4 2 3 2
6 2 1 10
8 3 3 1
9 3 4 3
10 3 5 3
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.