[英]Pandas: Split a Dataframe into separate Dataframes based on certain Column's string values
Haven't found any answers that I could apply to my problem so here it goes: 没有找到任何我可以应用于我的问题的答案,所以在这里:
I have an initial dataframe of images that I would like to split into two, based on the description of that image, which is a string in the "Description" column. 我有一个初始的图像数据框,我想根据该图像的描述将其分成两部分,这是“描述”列中的一个字符串。
My problem issue is that not all descriptions are equally written. 我的问题是并非所有描述都是同等的。 Here's an example of what I mean: 这是我的意思的一个例子:
Some images are accelerated and others aren't. 有些图像是加速的,有些则不是。 That's the criteria I want to use to split the dataset. 这是我想用来分割数据集的标准。
However even accelerated and non-accelerated image descriptions vary among them. 然而,即使加速和非加速图像描述也各不相同。
My strategy would be to rename every string that has "ACC" in it - this would cover all accelerated images - to "ACCELERATED IMAGE". 我的策略是将其中包含“ACC”的每个字符串重命名为“ACCELERATED IMAGE”,这将覆盖所有加速图像。
Then I could do: 然后我可以这样做:
df_Accl = df[df.Description == "ACCELERATED IMAGE"]
df_NonAccl = df[df.Description != "ACCELERATED IMAGE"]
How can I achieve this? 我怎样才能做到这一点? This was just a strategy that I came up with, if there's any other more efficient way of doing this feel free to speak it. 这只是我想出的一个策略,如果有任何其他更有效的方式可以随意说出来。
You can use str.contains
for boolean mask - then filter by boolean indexing
. 你可以使用str.contains
作为布尔掩码 - 然后通过boolean indexing
进行过滤。
For invert mask is use ~
, filter rows not contains ACC
: 对于反转掩码使用~
,过滤行不包含ACC
:
mask = df.Description.str.contains("ACC")
df_Accl = df[mask]
df_NonAccl = df[~mask]
您可以使用contains
来查找包含子字符串ACC
的行:
df['Description'].str.contains('ACC')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.