检查 Series 是否包含列表中的任何元素

Question

I'm reading a large CSV file and one of the columns has below representation.我正在阅读一个大型 CSV 文件，其中一列具有以下表示。

import pandas as pd

df['col1'] = pd.Series(
    ["37", "AWESOME House", "Yellow Cottage, 107", "14"], dtype='object'
)

My code uses "vectorized string methods" to return desired data in timely fashion.我的代码使用“矢量化字符串方法”及时返回所需的数据。

Simplified code to illustrate some parts of logic.简化代码以说明逻辑的某些部分。

import numpy as np

sth = np.where(
    <check condition>,
    df['col1'].str.lower(),
    df['some_other_column'].whatever()
)

Next I'd like to check if each value in my Series contains any element from below list.接下来我想检查我的Series中的每个值是否包含下面列表中的任何元素。

check_list = ['a', 'b', 'c']

So expected result (for "check condition") would be:所以预期的结果（对于“检查条件”）将是：

False
True
True
False

I tried this我试过这个

np.where(
    np.any([x in df['col1'].str.lower() for x in check_list])
...

but received error:但收到错误：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How could I solve my problem correctly?我怎样才能正确解决我的问题？

Answer 1

Use Series.str.contains with joined values of list by |将Series.str.contains与 list by |的连接值一起使用for regex or with case=False for case non sensitive search:对于正则表达式or case=False对于不区分大小写的搜索：

print (df['col1'].str.contains('|'.join(check_list), case=False))
0    False
1     True
2     True
3    False
Name: col1, dtype: bool

Without regex :没有regex ：

print (df['col1'].apply(lambda x: any([i in x.lower() for i in check_list])))
0    False
1     True
2     True
3    False
Name: col1, dtype: bool

print ([any([i in x.lower() for i in check_list]) for x in df['col1']])
[False, True, True, False]

检查 Series 是否包含列表中的任何元素

问题描述

1 个解决方案

解决方案1
4 已采纳 2021-05-26 07:26:17

检查 Series 是否包含列表中的任何元素

问题描述

1 个解决方案

解决方案1 4 已采纳 2021-05-26 07:26:17

解决方案1
4 已采纳 2021-05-26 07:26:17