根据另一个列值在列中保留具有特定值的连续行中的最大行

Question

I have a df.我有一个df。

import pandas as pd 
df = pd.DataFrame({'id_c':[1] * 4 + [2] * 3 + [3] * 4,
        'Run':[7,8,5,4,3,2,1,2,3,4,5], 
      'Date_diff':[4,12,0,0,2,2,10,1,1,3,3]})

id_c   Run   Date_diff

1    7      4
1    8      12
1    5      0
1    4      0
2    3      2
2    2      2
2    1      10
3    2      1
3    3      1
3    4      3
3    5      3

For each unique value of id_c, if Date_diff equals to 0, 1, 2 for two consecutive rows, I want to keep the row with the maximum value in Run.对于 id_c 的每个唯一值，如果连续两行的 Date_diff 等于 0、1、2，我想在 Run 中保留具有最大值的行。

I tried:我试过了：

df.groupby(['id_c' , 'Date_diff'])['Run'].idxmax()]

But it also selects maximum values for values of Date_diff different than 0, 1, 2.但它也会为不同于 0、1、2 的 Date_diff 值选择最大值。

The desired output would be:所需的 output 将是：

id_c Run   Date_diff
1    7      4
1    8      12
1    5      0
2    3      2
2    1      10
3    3      1
3    4      3
3    5      3

Thanks!谢谢！

Answer 1

IIUC, compute a custom group and get the max index per group, then slice: IIUC，计算自定义组并获取每个组的最大索引，然后切片：

# get values not in 0/1/2
mask = ~df['Date_diff'].isin([0,1,2])
# group the consecutive 0/1/2 and get id of max Run
idx = df.groupby(['id_c', (mask|mask.shift()).cumsum()])['Run'].idxmax().values

# slice output with max ids
out = df.loc[idx]

output: output：

    id_c  Run  Date_diff
0      1    7          4
1      1    8         12
2      1    5          0
4      2    3          2
6      2    1         10
8      3    3          1
9      3    4          3
10     3    5          3

根据另一个列值在列中保留具有特定值的连续行中的最大行

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-02-18 09:25:17

根据另一个列值在列中保留具有特定值的连续行中的最大行

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-02-18 09:25:17

解决方案1
1 已采纳 2022-02-18 09:25:17