繁体   English   中英

检查 Pandas DataFrame 列中的字符串是否在字符串列表中

[英]Check if a string in a Pandas DataFrame column is in a list of strings

如果我有这样的框架

frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})

我想检查这些行中是否有任何一行包含某个单词,我只需要这样做。

frame['b'] = frame.a.str.contains("dog") | frame.a.str.contains("cat") | frame.a.str.contains("fish")

frame['b']输出:

True
False
True

如果我决定列一个清单

mylist =['dog', 'cat', 'fish']

我将如何检查行中是否包含列表中的某个单词?

frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})

frame
                  a
0   the cat is blue
1  the sky is green
2  the dog is black

str.contains方法接受一个正则表达式模式:

mylist = ['dog', 'cat', 'fish']
pattern = '|'.join(mylist)

pattern
'dog|cat|fish'

frame.a.str.contains(pattern)
0     True
1    False
2     True
Name: a, dtype: bool

由于支持正则表达式模式,您还可以嵌入标志:

frame = pd.DataFrame({'a' : ['Cat Mr. Nibbles is blue', 'the sky is green', 'the dog is black']})

frame
                     a
0  Cat Mr. Nibbles is blue
1         the sky is green
2         the dog is black

pattern = '|'.join([f'(?i){animal}' for animal in mylist])  # python 3.6+

pattern
'(?i)dog|(?i)cat|(?i)fish'
 
frame.a.str.contains(pattern)
0     True  # Because of the (?i) flag, 'Cat' is also matched to 'cat'
1    False
2     True

对于列表应该工作

print frame[frame['a'].isin(mylist)]     

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isin.html

在浏览了提取字符串的已接受答案的注释后,也可以尝试这种方法。

frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})

frame
              a
0   the cat is blue
1  the sky is green
2  the dog is black

让我们创建我们的列表,其中包含需要匹配和提取的字符串。

mylist = ['dog', 'cat', 'fish']
pattern = '|'.join(mylist)

现在让我们创建一个函数来负责查找和提取子字符串。

import re
def pattern_searcher(search_str:str, search_list:str):

    search_obj = re.search(search_list, search_str)
    if search_obj :
        return_str = search_str[search_obj.start(): search_obj.end()]
    else:
        return_str = 'NA'
    return return_str

我们将这个函数与 pandas.DataFrame.apply 一起使用

frame['matched_str'] = frame['a'].apply(lambda x: pattern_searcher(search_str=x, search_list=pattern))

结果 :

              a             matched_str
   0   the cat is blue         cat
   1  the sky is green         NA
   2  the dog is black         dog

例如,我们可以使用管道同时检查三种模式

for i in range(len(df)):
       if re.findall(r'car|oxide|gen', df.iat[i,1]):
           df.iat[i,2]='Yes'
       else:
           df.iat[i,2]='No'

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM