I have a dataframe like this:
Date | Cost Category | Vendor |
---|---|---|
2021-03-22 | - | FamilyMart |
2021-03-04 | - | FAMILY MART |
2021-03-14 | - | Subway MAIN |
2021-03-14 | - | OTHER |
2021-03-14 | - | Transit Authority |
2021-03-09 | - | Subway local |
2021-03-24 | - | Seven Eleven |
2021-03-14 | - | Seven-Eleven |
I want to add category tags like this:
Date | Cost Category | Vendor |
---|---|---|
2021-03-22 | Store | FamilyMart |
2021-03-04 | Store | FAMILY MART |
2021-03-14 | Dining | Subway MAIN |
2021-03-14 | - | OTHER |
2021-03-14 | - | Transit Authority |
2021-03-09 | Dining | Subway local |
2021-03-24 | Store | Seven Eleven |
2021-03-14 | Store | Seven-Eleven |
I try the following, which would just return the value of the matching element in the list:
from fuzzywuzzy import process
from fuzzywuzzy import fuzz
Store = ['Family Mart', 'Seven Eleven', 'York Mart', 'Tokyu', 'Ministop']
Dining = ['Subway', 'Salad Works']
def fuzz_m(col, cat_list, score_t):
tag, score = process.extractOne(col, cat_list, scorer = score_t)
if score < 51:
return ''
else:
return tag
df['Cost Category'] = df['Vendor'].apply(fuzz_m, cat_list = Store, score_t = fuzz.ratio)
Date | Cost Category | Vendor |
---|---|---|
2021-03-22 | Family Mart | FamilyMart |
2021-03-04 | Family Mart | FAMILY MART |
2021-03-14 | - | Subway MAIN |
2021-03-14 | - | OTHER |
2021-03-14 | - | Transit Authority |
2021-03-09 | - | Subway local |
2021-03-24 | Seven Eleven | Seven Eleven |
2021-03-14 | Seven Eleven | Seven-Eleven |
What I want to do is use a dictionary in place of cat_list and return the key in Cost Category.
dictionary = {'Store':['Family Mart', 'Seven Eleven', 'York Mart', 'Tokyu', 'Ministop'],
'Dining':['Subway', 'Salad Works']
}
Where if any value in the column has a 51+ match to an element in a list, then I want to add the key under Cost Category. If it is a low match (below 51) I want to do nothing.
Is there a feasible approach to achieve this?
With Series.apply()
, fuzz_m()
receives one Vendor
value at a time, so you can use that dictionary
directly as extractOne(value, dictionary)
:
def fuzz_m(value):
_, score, tag = process.extractOne(value, dictionary)
return tag if score > 50 else '-'
df['Cost Category'] = df['Vendor'].apply(fuzz_m)
# Date Cost Category Vendor
# 0 2021-03-22 Store FamilyMart
# 1 2021-03-04 Store FAMILY MART
# 2 2021-03-14 Dining Subway MAIN
# 3 2021-03-14 - OTHER
# 4 2021-03-14 - Transit Authority
# 5 2021-03-09 Dining Subway local
# 6 2021-03-24 Store Seven Eleven
# 7 2021-03-14 Store Seven-Eleven
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.