[英]Return key on fuzzy match of element in dictionary list
I have a dataframe like this:我有一个像这样的 dataframe:
Date![]() |
Cost Category![]() |
Vendor![]() |
---|---|---|
2021-03-22 ![]() |
- ![]() |
FamilyMart![]() |
2021-03-04 ![]() |
- ![]() |
FAMILY MART![]() |
2021-03-14 ![]() |
- ![]() |
Subway MAIN![]() |
2021-03-14 ![]() |
- ![]() |
OTHER![]() |
2021-03-14 ![]() |
- ![]() |
Transit Authority![]() |
2021-03-09 ![]() |
- ![]() |
Subway local![]() |
2021-03-24 ![]() |
- ![]() |
Seven Eleven ![]() |
2021-03-14 ![]() |
- ![]() |
Seven-Eleven ![]() |
I want to add category tags like this:我想添加这样的类别标签:
Date![]() |
Cost Category![]() |
Vendor![]() |
---|---|---|
2021-03-22 ![]() |
Store![]() |
FamilyMart![]() |
2021-03-04 ![]() |
Store![]() |
FAMILY MART![]() |
2021-03-14 ![]() |
Dining![]() |
Subway MAIN![]() |
2021-03-14 ![]() |
- ![]() |
OTHER![]() |
2021-03-14 ![]() |
- ![]() |
Transit Authority![]() |
2021-03-09 ![]() |
Dining![]() |
Subway local![]() |
2021-03-24 ![]() |
Store![]() |
Seven Eleven ![]() |
2021-03-14 ![]() |
Store![]() |
Seven-Eleven ![]() |
I try the following, which would just return the value of the matching element in the list:我尝试以下方法,它只会返回列表中匹配元素的值:
from fuzzywuzzy import process
from fuzzywuzzy import fuzz
Store = ['Family Mart', 'Seven Eleven', 'York Mart', 'Tokyu', 'Ministop']
Dining = ['Subway', 'Salad Works']
def fuzz_m(col, cat_list, score_t):
tag, score = process.extractOne(col, cat_list, scorer = score_t)
if score < 51:
return ''
else:
return tag
df['Cost Category'] = df['Vendor'].apply(fuzz_m, cat_list = Store, score_t = fuzz.ratio)
Date![]() |
Cost Category![]() |
Vendor![]() |
---|---|---|
2021-03-22 ![]() |
Family Mart![]() |
FamilyMart![]() |
2021-03-04 ![]() |
Family Mart![]() |
FAMILY MART![]() |
2021-03-14 ![]() |
- ![]() |
Subway MAIN![]() |
2021-03-14 ![]() |
- ![]() |
OTHER![]() |
2021-03-14 ![]() |
- ![]() |
Transit Authority![]() |
2021-03-09 ![]() |
- ![]() |
Subway local![]() |
2021-03-24 ![]() |
Seven Eleven ![]() |
Seven Eleven ![]() |
2021-03-14 ![]() |
Seven Eleven ![]() |
Seven-Eleven ![]() |
What I want to do is use a dictionary in place of cat_list and return the key in Cost Category.我想要做的是使用字典代替 cat_list 并返回成本类别中的键。
dictionary = {'Store':['Family Mart', 'Seven Eleven', 'York Mart', 'Tokyu', 'Ministop'],
'Dining':['Subway', 'Salad Works']
}
Where if any value in the column has a 51+ match to an element in a list, then I want to add the key under Cost Category.如果列中的任何值与列表中的元素匹配 51+,那么我想在成本类别下添加键。 If it is a low match (below 51) I want to do nothing.
如果是低匹配(低于 51),我什么也不想做。
Is there a feasible approach to achieve this?是否有可行的方法来实现这一目标?
With Series.apply()
, fuzz_m()
receives one Vendor
value at a time, so you can use that dictionary
directly as extractOne(value, dictionary)
:使用
Series.apply()
, fuzz_m()
一次接收一个Vendor
值,因此您可以将该dictionary
直接用作extractOne(value, dictionary)
:
def fuzz_m(value):
_, score, tag = process.extractOne(value, dictionary)
return tag if score > 50 else '-'
df['Cost Category'] = df['Vendor'].apply(fuzz_m)
# Date Cost Category Vendor
# 0 2021-03-22 Store FamilyMart
# 1 2021-03-04 Store FAMILY MART
# 2 2021-03-14 Dining Subway MAIN
# 3 2021-03-14 - OTHER
# 4 2021-03-14 - Transit Authority
# 5 2021-03-09 Dining Subway local
# 6 2021-03-24 Store Seven Eleven
# 7 2021-03-14 Store Seven-Eleven
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.