[英]How to filter a column with list of elements in Python dataframe
I have a data frame like this:我有一个这样的数据框:
df
ID Col_1
1 Apple, Cherry, Banana
2 Apple, Mango
3 Kiwi, Cherry
4 Apple, Cherry, Pear
5 Apple, Melon
6 Papaya, Cherry
I want to filter the data frame in these 3 ways:我想以这三种方式过滤数据框:
This is how my output looks like:这就是我的 output 的样子:
1. Col_1 has both Apple & Cherry
Output:
ID Col_1
1 Apple, Cherry, Banana
4 Apple, Cherry, Pear
2. Col_1 has Apple but not Cherry
Output:
ID Col_1
2 Apple, Mango
5 Apple, Melon
3. Col_1 has Cherry but not Apple
Output:
ID Col_1
3 Kiwi, Cherry
6 Papaya, Cherry
Can anyone help me with this?谁能帮我这个?
Let's first start by creating OP's dataframe我们首先从创建 OP 的 dataframe 开始
df = pd.DataFrame({'ID': [1, 2, 3, 4, 5, 6],
'Col_1': ['Apple, Cherry, Banana', 'Apple, Mango', 'Kiwi, Cherry', 'Apple, Cherry, Pear', 'Apple, Melon', 'Papaya, Cherry']})
[Out]:
ID Col_1
0 1 Apple, Cherry, Banana
1 2 Apple, Mango
2 3 Kiwi, Cherry
3 4 Apple, Cherry, Pear
4 5 Apple, Melon
5 6 Papaya, Cherry
Based on what OP shared, considering that the constraints are always dependent on apple
and cherry
, one can create a function, let's call it filter_df
, that takes as input a dataframe and two strings as follows根据 OP 共享的内容,考虑到约束始终依赖于apple
和cherry
,可以创建一个 function,我们称之为filter_df
,它将 dataframe 和两个字符串作为输入,如下所示
def filter_df(df, s1, s2):
# Col_1 has both Apple & Cherry
df1 = df[df['Col_1'].str.contains(s1) & df['Col_1'].str.contains(s2)]
# Col_1 has Apple but not Cherry
df2 = df[df['Col_1'].str.contains(s1) & ~df['Col_1'].str.contains(s2)]
# Col_1 has Cherry but not Apple
df3 = df[df['Col_1'].str.contains(s2) & ~df['Col_1'].str.contains(s1)]
return df1, df2, df3
Then, if one applies the function filter_df
to the dataframe df
, with the strings Apple
and Cherry
, one gets the following results然后,如果将 function filter_df
应用于 dataframe df
,并使用字符串Apple
和Cherry
,则会得到以下结果
df1, df2, df3 = filter_df(df, 'Apple', 'Cherry')
# df1 - Col_1 has both Apple & Cherry
[Out]:
ID Col_1
0 1 Apple, Cherry, Banana
3 4 Apple, Cherry, Pear
# df2 - Col_1 has Apple but not Cherry
[Out]:
ID Col_1
1 2 Apple, Mango
4 5 Apple, Melon
# df3 - Col_1 has Cherry but not Apple
[Out]:
ID Col_1
2 3 Kiwi, Cherry
5 6 Papaya, Cherry
If one wants to change the strings to consider, for example, Kiwi
and Mango
, or other strings, one can do that as well.如果想要更改要考虑的字符串,例如Kiwi
和Mango
或其他字符串,也可以这样做。 Also, if the conditions change in the future, one can easily adjust the function filter_df
accordingly.此外,如果将来条件发生变化,可以相应地轻松调整 function filter_df
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.