简体   繁体   中英

Pandas: Split a Dataframe into separate Dataframes based on certain Column's string values

Haven't found any answers that I could apply to my problem so here it goes:

I have an initial dataframe of images that I would like to split into two, based on the description of that image, which is a string in the "Description" column.

My problem issue is that not all descriptions are equally written. Here's an example of what I mean:

在此输入图像描述

Some images are accelerated and others aren't. That's the criteria I want to use to split the dataset.

However even accelerated and non-accelerated image descriptions vary among them.

My strategy would be to rename every string that has "ACC" in it - this would cover all accelerated images - to "ACCELERATED IMAGE".

Then I could do:

df_Accl = df[df.Description == "ACCELERATED IMAGE"]
df_NonAccl = df[df.Description != "ACCELERATED IMAGE"]

How can I achieve this? This was just a strategy that I came up with, if there's any other more efficient way of doing this feel free to speak it.

You can use str.contains for boolean mask - then filter by boolean indexing .

For invert mask is use ~ , filter rows not contains ACC :

mask = df.Description.str.contains("ACC")
df_Accl = df[mask]
df_NonAccl = df[~mask]

您可以使用contains来查找包含子字符串ACC的行:

df['Description'].str.contains('ACC')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM