Pandas：根据某些Column的字符串值将Dataframe拆分为单独的Dataframe

Question

Haven't found any answers that I could apply to my problem so here it goes: 没有找到任何我可以应用于我的问题的答案，所以在这里：

I have an initial dataframe of images that I would like to split into two, based on the description of that image, which is a string in the "Description" column. 我有一个初始的图像数据框，我想根据该图像的描述将其分成两部分，这是“描述”列中的一个字符串。

My problem issue is that not all descriptions are equally written. 我的问题是并非所有描述都是同等的。 Here's an example of what I mean: 这是我的意思的一个例子：

Some images are accelerated and others aren't. 有些图像是加速的，有些则不是。 That's the criteria I want to use to split the dataset. 这是我想用来分割数据集的标准。

However even accelerated and non-accelerated image descriptions vary among them. 然而，即使加速和非加速图像描述也各不相同。

My strategy would be to rename every string that has "ACC" in it - this would cover all accelerated images - to "ACCELERATED IMAGE". 我的策略是将其中包含“ACC”的每个字符串重命名为“ACCELERATED IMAGE”，这将覆盖所有加速图像。

Then I could do: 然后我可以这样做：

df_Accl = df[df.Description == "ACCELERATED IMAGE"]
df_NonAccl = df[df.Description != "ACCELERATED IMAGE"]

How can I achieve this? 我怎样才能做到这一点？ This was just a strategy that I came up with, if there's any other more efficient way of doing this feel free to speak it. 这只是我想出的一个策略，如果有任何其他更有效的方式可以随意说出来。

Answer 1

You can use str.contains for boolean mask - then filter by boolean indexing . 你可以使用str.contains作为布尔掩码 - 然后通过boolean indexing进行过滤。

For invert mask is use ~ , filter rows not contains ACC : 对于反转掩码使用~ ，过滤行不包含ACC ：

mask = df.Description.str.contains("ACC")
df_Accl = df[mask]
df_NonAccl = df[~mask]

Answer 2

您可以使用contains来查找包含子字符串ACC的行：

df['Description'].str.contains('ACC')

Pandas：根据某些Column的字符串值将Dataframe拆分为单独的Dataframe

问题描述

2 个解决方案

解决方案1
4 已采纳 2018-11-18 17:56:48

解决方案2
0 2018-11-18 17:56:56

Pandas：根据某些Column的字符串值将Dataframe拆分为单独的Dataframe

问题描述

2 个解决方案

解决方案1 4 已采纳 2018-11-18 17:56:48

解决方案2 0 2018-11-18 17:56:56

解决方案1
4 已采纳 2018-11-18 17:56:48

解决方案2
0 2018-11-18 17:56:56