简体   繁体   English

从 pandas dataframe 的列中选择特定值

[英]Selecting specific values out of a column in pandas dataframe

I have a column, 'state', that has the values 'failed', 'successful', and two or three other values.我有一列“状态”,其中包含值“失败”、“成功”和两个或三个其他值。

I am trying to create a dataframe with only the rows that contain 'failed' and 'successful' in the 'state' column.我正在尝试创建一个 dataframe,仅在“状态”列中包含“失败”和“成功”的行。

I have implemented the following code:我已经实现了以下代码:

df = df[df['state'].str.contains('failed' or 'successful', na = False)]

but I am only receiving 'failed' rows, not 'successful'.但我只收到“失败”行,而不是“成功”。

Any suggestions?有什么建议么? I have used this same format on other datasets with success我在其他数据集上成功使用了相同的格式

The issue is that the expression "failed" or "successful" evaluates to "failed" since the non-empty string "failed" is truthy.问题是表达式"failed" or "successful"的计算结果为"failed" ,因为非空字符串"failed"是真实的。 Read this question to learn why this happens.阅读此问题以了解为什么会发生这种情况。

What you really need to do is evaluate the column on 2 conditions: str.contains("failed") and str.contains("successful") and combine those results together.您真正需要做的是在 2 个条件下评估列: str.contains("failed")str.contains("successful")并将这些结果组合在一起。 You can do this using the |您可以使用| operator on the dataframes.数据帧上的运算符。

df[df["state"].str.contains("failed", na=False) | df["state"].str.contains("successful", na=False)]

EDIT: As Henry mentioned below, you can get a more succinct answer using regex with df.str.contains .编辑:正如亨利在下面提到的,您可以使用带有df.str.contains的正则表达式获得更简洁的答案。

df[df["state"].str.contains("failed|success", na=False)]

because ("failed" or "successful") == "failed" , check the short circuit behavior doc here .因为("failed" or "successful") == "failed" ,请在此处查看短路行为文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM