简体   繁体   English

熊猫:第一次满足条件后,如何删除组中的所有后续行?

[英]Pandas: How to delete all the subsequent rows of a group after a condition is met the first time?

I am trying to delete the subsequent rows of each group in my data frame after the variable date becomes 4 the first time. 我试图在变量日期第一次变为4后删除数据框中每个组的后续行。

df = pd.DataFrame({"date": [1,2,3,3,4,1,2,3,3,4,1,1,1,4,4,4,1,1,1,2,2,3,3,3,4,4],
               "variable": ["A", "A", "A","A","A","A", "A", "A","A","A", "B", "B", "B","B","B","B" ,"C", "C", "C","C", "D","D","D","D","D","D"],
               "no": [1, 2.2, 3.5, 1.5, 1.5,1, 2.2, 3.5, 1.5, 1.5, 1.2, 1.3, 1.1, 2, 3,1, 2.2, 3.5, 1.5, 1.5, 1.2, 1.3, 1.1, 2, 3,9],
               "value": [0.469112, -0.282863, -1.509059, -1.135632, 1.212112,0.469112, -0.282863, -1.509059, -1.135632, 1.212112, -0.173215,
                         0.119209, -1.044236, -0.861849, None,0.469112, -0.282863, -1.509059, -1.135632, 1.212112, -0.173215,
                         0.119209, -1.044236, -0.861849, None,0.87]})

date    variable    no  value
0   1   A   1.0 0.469112
1   2   A   2.2 -0.282863
2   3   A   3.5 -1.509059
3   3   A   1.5 -1.135632
4   4   A   1.5 1.212112
5   1   A   1.0 0.469112
6   2   A   2.2 -0.282863
7   3   A   3.5 -1.509059
8   3   A   1.5 -1.135632
9   4   A   1.5 1.212112
10  1   B   1.2 -0.173215
11  1   B   1.3 0.119209
12  1   B   1.1 -1.044236
13  4   B   2.0 -0.861849
14  4   B   3.0 NaN
15  4   B   1.0 0.469112
16  1   C   2.2 -0.282863
17  1   C   3.5 -1.509059
18  1   C   1.5 -1.135632
19  2   C   1.5 1.212112
20  2   D   1.2 -0.173215
21  3   D   1.3 0.119209
22  3   D   1.1 -1.044236
23  3   D   2.0 -0.861849
24  4   D   3.0 NaN
25  4   D   9.0 0.870000

I've tried these methods so far: 到目前为止,我已经尝试了以下方法:

def tail_test(group):
    group = group[~(group.date.eq(4) | group.date.shift().eq(4))]
    return group

df_sub = df.groupby('variable').apply(tail_test).reset_index(drop=True)

which outputs: 输出:

date    variable    no  value
0   1   A   1.0 0.469112
1   2   A   2.2 -0.282863
2   3   A   3.5 -1.509059
3   3   A   1.5 -1.135632
4   2   A   2.2 -0.282863
5   3   A   3.5 -1.509059
6   3   A   1.5 -1.135632
7   1   B   1.2 -0.173215
8   1   B   1.3 0.119209
9   1   B   1.1 -1.044236
10  1   C   2.2 -0.282863
11  1   C   3.5 -1.509059
12  1   C   1.5 -1.135632
13  2   C   1.5 1.212112
14  2   D   1.2 -0.173215
15  3   D   1.3 0.119209
16  3   D   1.1 -1.044236
17  3   D   2.0 -0.861849

Basically, it is dropping all the 4's but not all the subsequent values after that 基本上,它会丢弃所有4,但此后不会丢弃所有后续值

The next method I tried is: 我尝试的下一个方法是:

def f(df):
    mask = (df.date == 4).cumsum() <= 1
    return df[mask]

df_sub = df.groupby("variable").apply(f)

The output is: 输出为:

    date    variable    no  value
0   1   A   1.0 0.469112
1   2   A   2.2 -0.282863
2   3   A   3.5 -1.509059
3   3   A   1.5 -1.135632
4   4   A   1.5 1.212112
5   1   A   1.0 0.469112
6   2   A   2.2 -0.282863
7   3   A   3.5 -1.509059
8   3   A   1.5 -1.135632
9   1   B   1.2 -0.173215
10  1   B   1.3 0.119209
11  1   B   1.1 -1.044236
12  4   B   2.0 -0.861849
13  1   C   2.2 -0.282863
14  1   C   3.5 -1.509059
15  1   C   1.5 -1.135632
16  2   C   1.5 1.212112
17  2   D   1.2 -0.173215
18  3   D   1.3 0.119209
19  3   D   1.1 -1.044236
20  3   D   2.0 -0.861849
21  4   D   3.0 NaN

I might be making some stupid mistake that I can't figure out. 我可能犯了一些我不知道的愚蠢错误。 Please help! 请帮忙!

IIUC, can groupby and use cumprod to detect when to start filtering IIUC,可以groupby和使用cumprod检测何时开始过滤

df[df.groupby('variable').date.transform(lambda s: s.ne(4).cumprod().astype(bool))]

   date variable   no     value
0      1        A  1.0  0.469112
1      2        A  2.2 -0.282863
2      3        A  3.5 -1.509059
3      3        A  1.5 -1.135632
10     1        B  1.2 -0.173215
11     1        B  1.3  0.119209
12     1        B  1.1 -1.044236
16     1        C  2.2 -0.282863
17     1        C  3.5 -1.509059
18     1        C  1.5 -1.135632
19     2        C  1.5  1.212112
20     2        D  1.2 -0.173215
21     3        D  1.3  0.119209
22     3        D  1.1 -1.044236
23     3        D  2.0 -0.861849

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 删除 pandas 数据帧中第一次出现条件后的所有行 - Delete all rows after the first appearance of a condition in a pandas data frame Pandas 适用于后续行和分组的条件 - Pandas apply if condition on subsequent rows and group by 如果满足条件,如何删除 pandas dataframe 中的特定行 - How to delete specific rows in pandas dataframe if a condition is met 删除熊猫数据框中的行:每次满足特定条件时删除前k行 - Deleting rows in pandas data frame: Delete previous k rows each time a certain condition is met 删除条件在熊猫数据框中首次出现之前的所有行 - Delete all rows before the first appearance of a condition in a pandas data frame Pandas 如果满足条件则按 id 分组并丢弃不满足条件的行 - Pandas group by id if condition is met and drop rows that are not meeting condition 保留 Pandas 组中第一行的所有列值以及后续行中的任何更新? - Keep all the column values from first row and any updates in subsequent rows in a Pandas group? 如何在 pandas 中添加行直到满足条件 - How to add rows in pandas until condition is met 如果在组内不满足任何条件,如何 select 所有行,如果在 pandas 中满足组内的某些条件,如何 select 行的子集 - How to select all rows if no conditions are met within a group and select a subset of rows if certain conditions within a group are met in pandas 在满足条件后,使用pandas删除组中的所有观察 - Using pandas to drop all observations in a group after a condition has been met
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM