[英]Check if Series contains any element from a list
I'm reading a large CSV file and one of the columns has below representation.我正在阅读一个大型 CSV 文件,其中一列具有以下表示。
import pandas as pd
df['col1'] = pd.Series(
["37", "AWESOME House", "Yellow Cottage, 107", "14"], dtype='object'
)
My code uses "vectorized string methods" to return desired data in timely fashion.我的代码使用“矢量化字符串方法”及时返回所需的数据。
Simplified code to illustrate some parts of logic.简化代码以说明逻辑的某些部分。
import numpy as np
sth = np.where(
<check condition>,
df['col1'].str.lower(),
df['some_other_column'].whatever()
)
Next I'd like to check if each value in my Series
contains any element from below list.接下来我想检查我的
Series
中的每个值是否包含下面列表中的任何元素。
check_list = ['a', 'b', 'c']
So expected result (for "check condition") would be:所以预期的结果(对于“检查条件”)将是:
False
True
True
False
I tried this我试过这个
np.where(
np.any([x in df['col1'].str.lower() for x in check_list])
...
but received error:但收到错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
How could I solve my problem correctly?我怎样才能正确解决我的问题?
Use Series.str.contains
with joined values of list by |
将
Series.str.contains
与 list by |
的连接值一起使用for regex or
with case=False
for case non sensitive search:对于正则表达式
or
case=False
对于不区分大小写的搜索:
print (df['col1'].str.contains('|'.join(check_list), case=False))
0 False
1 True
2 True
3 False
Name: col1, dtype: bool
Without regex
:没有
regex
:
print (df['col1'].apply(lambda x: any([i in x.lower() for i in check_list])))
0 False
1 True
2 True
3 False
Name: col1, dtype: bool
print ([any([i in x.lower() for i in check_list]) for x in df['col1']])
[False, True, True, False]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.