[英]Query Pandas dataframe for EXACT word in a column expanding contains
Having a dataframe df with the following columns:具有 dataframe df 和以下列:
Index(['category', 'synonyms_text', 'enabled', 'stems_text'], dtype='object')
I am interested on getting just the rows containing in synonyms_text
just the word food
and not seafood
for instance:我有兴趣只获取synonyms_text
中包含单词food
而不是seafood
的行,例如:
df_text= df_syn.loc[df_syn['synonyms_text'].str.contains('food')]
Having the following result (which contains seafood, foodlocker and others that are not wanted):有以下结果(其中包含海鲜、foodlocker 和其他不需要的):
category synonyms_text \
130 Fishing seafarm, seafood, shellfish, sportfish
141 Refrigeration coldstorage, foodlocker, freeze, fridge, ice, refrigeration
183 Food Service cook, fastfood, foodserve, foodservice, foodtruck, mealprep
200 Restaurant expresso, food, galley, gastropub, grill, java, kitchen
377 fastfood carryout, fastfood, takeout
379 Animal Supplies feed, fodder, grain, hay, petfood
613 store convenience, food, grocer, grocery, market
Then, I sent the result to a list to get just food as word:然后,我将结果发送到一个列表,以获取食物作为单词:
food_l=df_text['synonyms_text'].str.split().tolist()
However, I am getting in the list values as the following:但是,我得到的列表值如下:
['carryout,', 'fastfood,', 'takeout']
so, I get rid of commas:所以,我去掉逗号:
food_l= [[x.replace(",","") for x in l]for l in food_l]
Then, finally I will get just the word food
from the lists of list:然后,最后我会从列表列表中得到food
这个词:
food_l= [[l for x in l if "food"==x]for l in food_l]
After, I get rid of empty lists:之后,我摆脱了空列表:
food_l= [x for x in food_l if x != []]
Finally, I flatten the lists of list to get the final result:最后,我将列表列表展平以获得最终结果:
food_l = [item for sublist in food_l for item in sublist]
And the final result is as follows:最终结果如下:
[['bar', 'bistro', 'breakfast', 'buffet', 'cabaret', 'cafe', 'cantina', 'cappuccino', 'chai', 'coffee', 'commissary', 'cuisine', 'deli', 'dhaba', 'dine', 'diner', 'dining', 'eat', 'eater', 'eats', 'edible', 'espresso', 'expresso', 'food', 'galley', 'gastropub', 'grill', 'java', 'kitchen', 'latte', 'lounge', 'pizza', 'pizzeria', 'pub', 'publichouse', 'restaurant', 'roast', 'sandwich', 'snack', 'snax', 'socialhouse', 'steak', 'sub', 'sushi', 'takeout', 'taphouse', 'taverna', 'tea', 'tiffin', 'trattoria', 'treat', 'treatery'], ['convenience', 'food', 'grocer', 'grocery', 'market', 'mart', 'shop', 'store', 'variety']]
@Erfan This dataframe can be used as test: @Erfan 这个 dataframe 可以用作测试:
df= pd.DataFrame({'category':['Fishing','Refrigeration','store'],'synonyms_text':['seafood','foodlocker','food']})
Both give empty:两者都给空:
df_tmp= df.loc[df['synonyms_text'].str.match('\bfood\b')]
df_tmp= df.loc[df['synonyms_text'].str.contains(pat='\bfood\b', regex= True)]
Do you know a better way to get just the rows with the single word food
without going through all this painful process?你知道一个更好的方法来获得一个单词food
的行而不经历所有这些痛苦的过程吗? Do we have other function different to contains to look in the dataframe for an exact match in the values of the dataframe?我们是否有其他 function 不同于包含在 dataframe 中查找 dataframe 的值的完全匹配?
Thanks谢谢
Example dataframe:示例 dataframe:
df = pd.DataFrame({'category':['Fishing','Refrigeration','store'],
'synonyms_text':['seafood','foodlocker','food']})
print(df)
category synonyms_text
0 Fishing seafood
1 Refrigeration foodlocker
2 store food # <-- we want only the rows with exact "food"
Three ways we can do this:我们可以通过三种方式做到这一点:
str.match
str.contains
str.extract
(not very useful here) str.extract
(在这里不是很有用)# 1
df['synonyms_text'].str.match(r'\bfood\b')
# 2
df['synonyms_text'].str.match(r'\bfood\b')
# 3
df['synonyms_text'].str.extract(r'(\bfood\b)').eq('food')
output output
0 False
1 False
2 True
Name: synonyms_text, dtype: bool
Finally we use boolean
series to filter out dataframe .loc
最后我们用boolean
系列过滤掉dataframe .loc
m = df['synonyms_text'].str.match(r'\bfood\b')
df.loc[m]
output output
category synonyms_text
2 store food
Bonus :奖金:
To match case insensitive use ?i
:要匹配不区分大小写的使用?i
:
For example:例如:
df['synonyms_text'].str.match(r'\b(?i)food\b')
Which will match: food
, Food
, FOOD
, fOoD
哪个将匹配: food
, Food
, FOOD
, fOoD
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.