简体   繁体   English

Pandas:根据某些Column的字符串值将Dataframe拆分为单独的Dataframe

[英]Pandas: Split a Dataframe into separate Dataframes based on certain Column's string values

Haven't found any answers that I could apply to my problem so here it goes: 没有找到任何我可以应用于我的问题的答案,所以在这里:

I have an initial dataframe of images that I would like to split into two, based on the description of that image, which is a string in the "Description" column. 我有一个初始的图像数据框,我想根据该图像的描述将其分成两部分,这是“描述”列中的一个字符串。

My problem issue is that not all descriptions are equally written. 我的问题是并非所有描述都是同等的。 Here's an example of what I mean: 这是我的意思的一个例子:

在此输入图像描述

Some images are accelerated and others aren't. 有些图像是加速的,有些则不是。 That's the criteria I want to use to split the dataset. 这是我想用来分割数据集的标准。

However even accelerated and non-accelerated image descriptions vary among them. 然而,即使加速和非加速图像描述也各不相同。

My strategy would be to rename every string that has "ACC" in it - this would cover all accelerated images - to "ACCELERATED IMAGE". 我的策略是将其中包含“ACC”的每个字符串重命名为“ACCELERATED IMAGE”,这将覆盖所有加速图像。

Then I could do: 然后我可以这样做:

df_Accl = df[df.Description == "ACCELERATED IMAGE"]
df_NonAccl = df[df.Description != "ACCELERATED IMAGE"]

How can I achieve this? 我怎样才能做到这一点? This was just a strategy that I came up with, if there's any other more efficient way of doing this feel free to speak it. 这只是我想出的一个策略,如果有任何其他更有效的方式可以随意说出来。

You can use str.contains for boolean mask - then filter by boolean indexing . 你可以使用str.contains作为布尔掩码 - 然后通过boolean indexing进行过滤。

For invert mask is use ~ , filter rows not contains ACC : 对于反转掩码使用~ ,过滤行不包含ACC

mask = df.Description.str.contains("ACC")
df_Accl = df[mask]
df_NonAccl = df[~mask]

您可以使用contains来查找包含子字符串ACC的行:

df['Description'].str.contains('ACC')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM