简体   繁体   English

如果在类型列表的单元格值中找到字符串,如何根据条件删除 pandas dataframe 中的行?

[英]How to delete row in pandas dataframe based on condition if string is found in cell value of type list?

I've been struggling with the following issue that sounds very easy in fact but can't seem to figure it out and I'm sure it's something very obvious in the stacktrace but I'm just being dumb.我一直在努力解决以下问题,这些问题实际上听起来很简单,但似乎无法弄清楚,我确信它在堆栈跟踪中非常明显,但我只是愚蠢。

I simply have a pandas dataframe looking like this:我只是有一个 pandas dataframe 看起来像这样:

数据框

And want to drop the rows that contain, in the jpgs cell value (list), the value "123.jpg".并希望在 jpgs 单元格值(列表)中删除包含值“123.jpg”的行。 So normally I would get the final dataframe with only rows of index 1 and 3.所以通常我会得到最终的 dataframe 只有索引 1 和 3 的行。

However I've tried a lot of methods and none of them works.但是我试了很多方法都没有用。

For example:例如:

df = df["123.jpg" not in df.jpgs]

or或者

df = df[df.jpgs.tolist().count("123.jpg") == 0]

give error KeyError: True:给出错误 KeyError: True:

呃

df = df[df['jpgs'].str.contains('123.jpg') == False]

Returns an empty dataframe:返回一个空的 dataframe:

错误2

df = df[df.jpgs.count("123.jpg") == 0]

And

df = df.drop(df["123.jpg" in df.jpgs].index)

Gives KeyError: False:给出 KeyError: False:

呃

This is my entire code if needed, and I would really appreciate if someone would help me with an answer to what I'm doing wrong:(. Thanks!!如果需要,这是我的全部代码,如果有人能帮助我回答我做错了什么,我将不胜感激:(。谢谢!!

import pandas as pd

df = pd.DataFrame(columns=["person_id", "jpgs"])

id = 1
pair1 = ["123.jpg", "124.jpg"]
pair2 = ["125.jpg", "300.jpg"]
pair3 = ["500.jpg", "123.jpg"]
pair4 = ["111.jpg", "122.jpg"]
row1 = {'person_id': id, 'jpgs': pair1}
row2 = {'person_id': id, 'jpgs': pair2}
row3 = {'person_id': id, 'jpgs': pair3}
row4 = {'person_id': id, 'jpgs': pair4}

df = df.append(row1, ignore_index=True)
df = df.append(row2, ignore_index=True)
df = df.append(row3, ignore_index=True)
df = df.append(row4, ignore_index=True)
print(df)

#df = df["123.jpg" not in df.jpgs]
#df = df[df['jpgs'].str.contains('123.jpg') == False]

#df = df[df.jpgs.tolist().count("123.jpg") == 0]
df = df.drop(df["123.jpg" in df.jpgs].index)
print("\n Final df")
print(df)

Since you filter on a list column, apply lambda would probably be the easiest:由于您在列表列上进行过滤,因此应用 lambda 可能是最简单的:

df.loc[df.jpgs.apply(lambda x: "123.jpg" not in x)]

Quick comments on your attempts:对您的尝试的快速评论:

  • In df = df.drop(df["123.jpg" in df.jpgs].index) you are checking whether the exact value "123.jpg" is contained in the column ( "123.jpg" in df.jpgs ) rather than in any of the lists, which is not what you want.df = df.drop(df["123.jpg" in df.jpgs].index)您正在检查列中是否包含确切的值 "123.jpg" ( "123.jpg" in df.jpgs )而不是在任何列表中,这不是您想要的。

  • In df = df[df['jpgs'].str.contains('123.jpg') == False] goes in the right direction, but you are missing the regex=False keyword, as shown in Ibrahim's answer.df = df[df['jpgs'].str.contains('123.jpg') == False]中,方向正确,但您缺少regex=False关键字,如 Ibrahim 的回答所示。

  • df[df.jpgs.count("123.jpg") == 0] is also not applicable here, since count returns the total number of non-NaN values in the Series. df[df.jpgs.count("123.jpg") == 0]在这里也不适用,因为count返回系列中非 NaN 值的总数。

For str.contains one this is how it is done对于str.contains一个,这是如何完成的

df[df.jpgs.str.contains("123.jpg", regex=False)]

You can try this:你可以试试这个:

mask = df.jpgs.apply(lambda x: '123.jpg' not in x)
df = df[mask]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM