[英]How to remove a row based on a condition in pandas?
I have a following dataframe:我有以下 dataframe:
Index指数 | Description描述 |
---|---|
0 0 | Tab tab_1 of type yyy opened by User A用户 A 打开的类型为 yyy 的选项卡 tab_1 |
1 1个 | some_value一些值 |
2 2个 | Tab tab_1 of type xxx opened by User B用户 B 打开的 xxx 类型的选项卡 tab_1 |
3 3个 | Tab tab_4 of type yyy opened by User A用户 A 打开的类型为 yyy 的选项卡 tab_4 |
4 4个 | some_value一些值 |
5 5个 | Tab tab_1 of type yyy closed by User A类型为 yyy 的选项卡 tab_1 已被用户 A 关闭 |
6 6个 | some_value一些值 |
7 7 | Tab tab_1 of type xxx closed by User B用户 B 关闭了 xxx 类型的选项卡 tab_1 |
8 8个 | Tab tab_2 of type yyy closed by User A类型为 yyy 的选项卡 tab_2 已被用户 A 关闭 |
9 9 | some_value一些值 |
10 10 | Tab tab_3 of type zzz closed by User C类型为 zzz 的选项卡 tab_3 已被用户 C 关闭 |
I would like to remove rows where cells in the "Description" column do not have a pair.我想删除“描述”列中的单元格没有成对的行。 By pairs I mean ie rows 0 and 5, and 2 and 7. Rows 3, 8 and 10 do not have their pairs - Certain tab IS opened by a certain user and IS NOT closed or IS closed but IS NOT opened.我所说的成对是指第 0 行和第 5 行,以及第 2 行和第 7 行。第 3、8 和 10 行没有它们的对 - 某个选项卡由某个用户打开但未关闭或已关闭但未打开。
Expected output:预计 output:
Index指数 | Description描述 |
---|---|
0 0 | Tab tab_1 of type yyy opened by User A用户 A 打开的类型为 yyy 的选项卡 tab_1 |
1 1个 | some_value一些值 |
2 2个 | Tab tab_1 of type xxx opened by User B用户 B 打开的 xxx 类型的选项卡 tab_1 |
4 4个 | some_value一些值 |
5 5个 | Tab tab_1 of type yyy closed by User A类型为 yyy 的选项卡 tab_1 已被用户 A 关闭 |
6 6个 | some_value一些值 |
7 7 | Tab tab_1 of type xxx closed by User B用户 B 关闭了 xxx 类型的选项卡 tab_1 |
9 9 | some_value一些值 |
Is there a way to do this?有没有办法做到这一点?
You can try this function duplicated
: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html你可以试试这个 function duplicated
: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html
For instance:例如:
df_new = df.duplicated(subset=['Description'])
honestly i'm not sure is it what you need but anyway you can try this:老实说,我不确定这是不是你需要的,但无论如何你可以试试这个:
mask = (df.groupby(df['Description'].str.replace('opened|closed','',regex=True))['Description'].
transform(lambda x: (x.str.contains('opened').any())&(x.str.contains('closed').any())))
res = df.loc[mask]
>>> res
'''
Index Description
0 Tab tab_1 of type yyy opened by User A
2 Tab tab_1 of type xxx opened by User B
5 Tab tab_1 of type yyy closed by User A
7 Tab tab_1 of type xxx closed by User B
replacing the text opened & closed with null then applying filtering (dataframegroupby method) to select where occurrence is one and then dropping it用 null 替换打开和关闭的文本,然后将过滤(dataframegroupby 方法)应用于 select,其中出现次数为 1,然后将其删除
data.drop(data.groupby(data['Description'].str.replace('opened|closed','',regex=True)).filter(lambda x: x['Description'].count() == 1).index)
Index Description
0 Tab tab_1 of type yyy opened by User A
1 some_value
2 Tab tab_1 of type xxx opened by User B
4 some_value
5 Tab tab_1 of type yyy closed by User A
6 some_value
7 Tab tab_1 of type xxx closed by User B
9 some_value
pandas DataFrames have method duplicated
, which does exactly what you need: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html pandas DataFrames have method duplicated
,这正是你需要的: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html
df.drop_duplicates('Description')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.