簡體   English   中英

熊貓-檢查字符串列是否包含一對字符串

[英]Pandas - check if a string column contains a pair of strings

假設我有一個像這樣的DataFrame:

df = pd.DataFrame({'consumption':['squirrel eats apple', 'monkey eats apple', 
                                  'monkey eats banana', 'badger eats banana'], 
                   'food':['apple', 'apple', 'banana', 'banana'], 
                   'creature':['squirrel', 'badger', 'monkey', 'elephant']})

    consumption creature    food
0   squirrel eats apple squirrel    apple
1   monkey eats apple   badger  apple
2   monkey eats banana  monkey  banana
3   badger eats banana  elephant    banana

我想在“消費”列中找到“生物”和“食物”組合出現的行,即如果蘋果和松鼠同時出現,則為True,但如果蘋果與Elephant一起出現,則為False。 同樣,如果Monkey&Banana一起出現,則為True,但Monkey-Apple將為假。

我嘗試的方法是這樣的:

creature_list = list(df['creature'])
creature_list = '|'.join(map(str, creature_list))

food_list = list(df['food'])
food_list = '|'.join(map(str, food_list))

np.where((df['consumption'].str.contains('('+creature_list+')', case = False)) 
          & (df['consumption'].str.contains('('+food_list+')', case = False)), 1, 0)

但這是行不通的,因為我在所有情況下都為True。

如何檢查字符串對?

這是一種可能的方法:

def match_consumption(r):
    if (r['creature'] in r['consumption']) and (r['food'] in r['consumption']):
        return True
    else:
        return False

df['match'] = df.apply(match_consumption, axis=1)
df

           consumption  creature    food  match
0  squirrel eats apple  squirrel   apple   True
1    monkey eats apple    badger   apple  False
2   monkey eats banana    monkey  banana   True
3   badger eats banana  elephant  banana  False

檢查字符串是否相等太簡單了? 您可以測試字符串<creature> eats <food>等於consumption列中的相應值:

(df.consumption == df.creature + " eats " + df.food)

我相信有更好的方法可以做到這一點。 但這是一種方式。

import pandas as pd
import re

df = pd.DataFrame({'consumption':['squirrel eats apple', 'monkey eats apple', 'monkey eats banana', 'badger eats banana'], 'food':['apple', 'apple', 'banana', 'banana'], 'creature':['squirrel', 'badger', 'monkey', 'elephant']})

test = []
for i in range(len(df.consumption)):
    test.append(bool(re.search(df.creature[i],df.consumption[i])) & bool((re.search(df.food[i], df.consumption[i]))))
df['test'] = test

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM