从 pandas dataframe 的列中选择特定值

Question

I have a column, 'state', that has the values 'failed', 'successful', and two or three other values.我有一列“状态”，其中包含值“失败”、“成功”和两个或三个其他值。

I am trying to create a dataframe with only the rows that contain 'failed' and 'successful' in the 'state' column.我正在尝试创建一个 dataframe，仅在“状态”列中包含“失败”和“成功”的行。

I have implemented the following code:我已经实现了以下代码：

df = df[df['state'].str.contains('failed' or 'successful', na = False)]

but I am only receiving 'failed' rows, not 'successful'.但我只收到“失败”行，而不是“成功”。

Any suggestions?有什么建议么？ I have used this same format on other datasets with success我在其他数据集上成功使用了相同的格式

Answer 1

The issue is that the expression "failed" or "successful" evaluates to "failed" since the non-empty string "failed" is truthy.问题是表达式"failed" or "successful"的计算结果为"failed" ，因为非空字符串"failed"是真实的。 Read this question to learn why this happens.阅读此问题以了解为什么会发生这种情况。

What you really need to do is evaluate the column on 2 conditions: str.contains("failed") and str.contains("successful") and combine those results together.您真正需要做的是在 2 个条件下评估列： str.contains("failed")和str.contains("successful")并将这些结果组合在一起。 You can do this using the |您可以使用| operator on the dataframes.数据帧上的运算符。

df[df["state"].str.contains("failed", na=False) | df["state"].str.contains("successful", na=False)]

EDIT: As Henry mentioned below, you can get a more succinct answer using regex with df.str.contains .编辑：正如亨利在下面提到的，您可以使用带有df.str.contains的正则表达式获得更简洁的答案。

df[df["state"].str.contains("failed|success", na=False)]

Answer 2

because ("failed" or "successful") == "failed" , check the short circuit behavior doc here .因为("failed" or "successful") == "failed" ，请在此处查看短路行为文档。

从 pandas dataframe 的列中选择特定值

问题描述

2 个解决方案

解决方案1
0 2021-11-28 00:20:31

解决方案2
0 2021-11-28 00:20:47

从 pandas dataframe 的列中选择特定值

问题描述

2 个解决方案

解决方案1 0 2021-11-28 00:20:31

解决方案2 0 2021-11-28 00:20:47

解决方案1
0 2021-11-28 00:20:31

解决方案2
0 2021-11-28 00:20:47