查詢 Pandas dataframe 以獲取擴展包含的列中的 EXACT 字詞

Question

具有 dataframe df 和以下列：

Index(['category', 'synonyms_text', 'enabled', 'stems_text'], dtype='object')

我有興趣只獲取synonyms_text中包含單詞food而不是seafood的行，例如：

df_text= df_syn.loc[df_syn['synonyms_text'].str.contains('food')]

有以下結果（其中包含海鮮、foodlocker 和其他不需要的）：

           category   synonyms_text  \
130          Fishing  seafarm, seafood, shellfish, sportfish   
141   Refrigeration   coldstorage, foodlocker, freeze, fridge, ice, refrigeration   
183     Food Service  cook, fastfood, foodserve, foodservice, foodtruck, mealprep   
200       Restaurant  expresso, food, galley, gastropub, grill, java, kitchen
377         fastfood  carryout, fastfood, takeout
379  Animal Supplies  feed, fodder, grain, hay, petfood   
613            store  convenience, food, grocer, grocery, market

然后，我將結果發送到一個列表，以獲取食物作為單詞：

food_l=df_text['synonyms_text'].str.split().tolist()

但是，我得到的列表值如下：

['carryout,', 'fastfood,', 'takeout']

所以，我去掉逗號：

food_l= [[x.replace(",","") for x in l]for l in food_l]

然后，最后我會從列表列表中得到food這個詞：

food_l= [[l for x in l if "food"==x]for l in food_l]

之后，我擺脫了空列表：

food_l= [x for x in food_l if x != []]

最后，我將列表列表展平以獲得最終結果：

food_l = [item for sublist in food_l for item in sublist]

最終結果如下：

[['bar', 'bistro', 'breakfast', 'buffet', 'cabaret', 'cafe', 'cantina', 'cappuccino', 'chai', 'coffee', 'commissary', 'cuisine', 'deli', 'dhaba', 'dine', 'diner', 'dining', 'eat', 'eater', 'eats', 'edible', 'espresso', 'expresso', 'food', 'galley', 'gastropub', 'grill', 'java', 'kitchen', 'latte', 'lounge', 'pizza', 'pizzeria', 'pub', 'publichouse', 'restaurant', 'roast', 'sandwich', 'snack', 'snax', 'socialhouse', 'steak', 'sub', 'sushi', 'takeout', 'taphouse', 'taverna', 'tea', 'tiffin', 'trattoria', 'treat', 'treatery'], ['convenience', 'food', 'grocer', 'grocery', 'market', 'mart', 'shop', 'store', 'variety']]

@Erfan 這個 dataframe 可以用作測試：

df= pd.DataFrame({'category':['Fishing','Refrigeration','store'],'synonyms_text':['seafood','foodlocker','food']})

兩者都給空：

df_tmp=  df.loc[df['synonyms_text'].str.match('\bfood\b')]
df_tmp= df.loc[df['synonyms_text'].str.contains(pat='\bfood\b', regex= True)]

你知道一個更好的方法來獲得一個單詞food的行而不經歷所有這些痛苦的過程嗎？ 我們是否有其他 function 不同於包含在 dataframe 中查找 dataframe 的值的完全匹配？

謝謝

Answer 1

示例 dataframe：

df = pd.DataFrame({'category':['Fishing','Refrigeration','store'],
                   'synonyms_text':['seafood','foodlocker','food']})

print(df)
        category synonyms_text
0        Fishing       seafood
1  Refrigeration    foodlocker
2          store          food # <-- we want only the rows with exact "food"

我們可以通過三種方式做到這一點：

str.match
str.contains
str.extract （在這里不是很有用）

# 1
df['synonyms_text'].str.match(r'\bfood\b')

# 2 
df['synonyms_text'].str.match(r'\bfood\b')

# 3
df['synonyms_text'].str.extract(r'(\bfood\b)').eq('food')

output

0    False
1    False
2     True
Name: synonyms_text, dtype: bool

最后我們用boolean系列過濾掉dataframe .loc

m = df['synonyms_text'].str.match(r'\bfood\b')
df.loc[m]

output

  category synonyms_text
2    store          food

獎金：

要匹配不區分大小寫的使用?i ：

例如：

df['synonyms_text'].str.match(r'\b(?i)food\b')

哪個將匹配： food , Food , FOOD , fOoD

查詢 Pandas dataframe 以獲取擴展包含的列中的 EXACT 字詞

問題描述

1 個解決方案

解決方案1
1 已采納 2019-10-31 00:00:39

查詢 Pandas dataframe 以獲取擴展包含的列中的 EXACT 字詞

問題描述

1 個解決方案

解決方案1 1 已采納 2019-10-31 00:00:39

解決方案1
1 已采納 2019-10-31 00:00:39