如何根据 pandas 中的条件删除一行？

Question

I have a following dataframe:我有以下 dataframe：

Index指数	Description描述
0 0	Tab tab_1 of type yyy opened by User A用户 A 打开的类型为 yyy 的选项卡 tab_1
1 1个	some_value一些值
2 2个	Tab tab_1 of type xxx opened by User B用户 B 打开的 xxx 类型的选项卡 tab_1
3 3个	Tab tab_4 of type yyy opened by User A用户 A 打开的类型为 yyy 的选项卡 tab_4
4 4个	some_value一些值
5 5个	Tab tab_1 of type yyy closed by User A类型为 yyy 的选项卡 tab_1 已被用户 A 关闭
6 6个	some_value一些值
7 7	Tab tab_1 of type xxx closed by User B用户 B 关闭了 xxx 类型的选项卡 tab_1
8 8个	Tab tab_2 of type yyy closed by User A类型为 yyy 的选项卡 tab_2 已被用户 A 关闭
9 9	some_value一些值
10 10	Tab tab_3 of type zzz closed by User C类型为 zzz 的选项卡 tab_3 已被用户 C 关闭

I would like to remove rows where cells in the "Description" column do not have a pair.我想删除“描述”列中的单元格没有成对的行。 By pairs I mean ie rows 0 and 5, and 2 and 7. Rows 3, 8 and 10 do not have their pairs - Certain tab IS opened by a certain user and IS NOT closed or IS closed but IS NOT opened.我所说的成对是指第 0 行和第 5 行，以及第 2 行和第 7 行。第 3、8 和 10 行没有它们的对 - 某个选项卡由某个用户打开但未关闭或已关闭但未打开。

Expected output:预计 output：

Index指数	Description描述
0 0	Tab tab_1 of type yyy opened by User A用户 A 打开的类型为 yyy 的选项卡 tab_1
1 1个	some_value一些值
2 2个	Tab tab_1 of type xxx opened by User B用户 B 打开的 xxx 类型的选项卡 tab_1
4 4个	some_value一些值
5 5个	Tab tab_1 of type yyy closed by User A类型为 yyy 的选项卡 tab_1 已被用户 A 关闭
6 6个	some_value一些值
7 7	Tab tab_1 of type xxx closed by User B用户 B 关闭了 xxx 类型的选项卡 tab_1
9 9	some_value一些值

Is there a way to do this?有没有办法做到这一点？

Answer 1

You can try this function duplicated : https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html你可以试试这个 function duplicated ： https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html

For instance:例如：

df_new = df.duplicated(subset=['Description'])

Answer 2

honestly i'm not sure is it what you need but anyway you can try this:老实说，我不确定这是不是你需要的，但无论如何你可以试试这个：

mask = (df.groupby(df['Description'].str.replace('opened|closed','',regex=True))['Description'].
        transform(lambda x: (x.str.contains('opened').any())&(x.str.contains('closed').any())))

res = df.loc[mask]

>>> res
'''
                                  
Index                             Description           
0      Tab tab_1 of type yyy opened by User A
2      Tab tab_1 of type xxx opened by User B
5      Tab tab_1 of type yyy closed by User A
7      Tab tab_1 of type xxx closed by User B

Answer 3

replacing the text opened & closed with null then applying filtering (dataframegroupby method) to select where occurrence is one and then dropping it用 null 替换打开和关闭的文本，然后将过滤（dataframegroupby 方法）应用于 select，其中出现次数为 1，然后将其删除

data.drop(data.groupby(data['Description'].str.replace('opened|closed','',regex=True)).filter(lambda x: x['Description'].count() == 1).index)

Index   Description
    0   Tab tab_1 of type yyy opened by User A
    1   some_value
    2   Tab tab_1 of type xxx opened by User B
    4   some_value
    5   Tab tab_1 of type yyy closed by User A
    6   some_value
    7   Tab tab_1 of type xxx closed by User B
    9   some_value

Answer 4

pandas DataFrames have method duplicated , which does exactly what you need: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html pandas DataFrames have method duplicated ，这正是你需要的： https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html

df.drop_duplicates('Description')

如何根据 pandas 中的条件删除一行？

问题描述

4 个解决方案

解决方案1
1 2022-04-26 14:36:48

解决方案2
0 2022-04-26 15:34:54

解决方案3
0 2022-04-26 16:43:56

解决方案4
-1 2022-04-26 14:35:38

如何根据 pandas 中的条件删除一行？

问题描述

4 个解决方案

解决方案1 1 2022-04-26 14:36:48

解决方案2 0 2022-04-26 15:34:54

解决方案3 0 2022-04-26 16:43:56

解决方案4 -1 2022-04-26 14:35:38

解决方案1
1 2022-04-26 14:36:48

解决方案2
0 2022-04-26 15:34:54

解决方案3
0 2022-04-26 16:43:56

解决方案4
-1 2022-04-26 14:35:38