[英]Selecting specific values out of a column in pandas dataframe
I have a column, 'state', that has the values 'failed', 'successful', and two or three other values.我有一列“状态”,其中包含值“失败”、“成功”和两个或三个其他值。
I am trying to create a dataframe with only the rows that contain 'failed' and 'successful' in the 'state' column.我正在尝试创建一个 dataframe,仅在“状态”列中包含“失败”和“成功”的行。
I have implemented the following code:我已经实现了以下代码:
df = df[df['state'].str.contains('failed' or 'successful', na = False)]
but I am only receiving 'failed' rows, not 'successful'.但我只收到“失败”行,而不是“成功”。
Any suggestions?有什么建议么? I have used this same format on other datasets with success
我在其他数据集上成功使用了相同的格式
The issue is that the expression "failed" or "successful"
evaluates to "failed"
since the non-empty string "failed"
is truthy.问题是表达式
"failed" or "successful"
的计算结果为"failed"
,因为非空字符串"failed"
是真实的。 Read this question to learn why this happens.阅读此问题以了解为什么会发生这种情况。
What you really need to do is evaluate the column on 2 conditions: str.contains("failed")
and str.contains("successful")
and combine those results together.您真正需要做的是在 2 个条件下评估列:
str.contains("failed")
和str.contains("successful")
并将这些结果组合在一起。 You can do this using the |
您可以使用
|
operator on the dataframes.数据帧上的运算符。
df[df["state"].str.contains("failed", na=False) | df["state"].str.contains("successful", na=False)]
EDIT: As Henry mentioned below, you can get a more succinct answer using regex with df.str.contains
.编辑:正如亨利在下面提到的,您可以使用带有
df.str.contains
的正则表达式获得更简洁的答案。
df[df["state"].str.contains("failed|success", na=False)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.