Haven't found any answers that I could apply to my problem so here it goes:
I have an initial dataframe of images that I would like to split into two, based on the description of that image, which is a string in the "Description" column.
My problem issue is that not all descriptions are equally written. Here's an example of what I mean:
Some images are accelerated and others aren't. That's the criteria I want to use to split the dataset.
However even accelerated and non-accelerated image descriptions vary among them.
My strategy would be to rename every string that has "ACC" in it - this would cover all accelerated images - to "ACCELERATED IMAGE".
Then I could do:
df_Accl = df[df.Description == "ACCELERATED IMAGE"]
df_NonAccl = df[df.Description != "ACCELERATED IMAGE"]
How can I achieve this? This was just a strategy that I came up with, if there's any other more efficient way of doing this feel free to speak it.
You can use str.contains
for boolean mask - then filter by boolean indexing
.
For invert mask is use ~
, filter rows not contains ACC
:
mask = df.Description.str.contains("ACC")
df_Accl = df[mask]
df_NonAccl = df[~mask]
您可以使用contains
来查找包含子字符串ACC
的行:
df['Description'].str.contains('ACC')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.