I have a pandas dataframe of credit card expenses of various yet-to-be-defined categories (gas, groceries, fast food, etc.).
df1:
Category Date Description Cost
nan 7.1.20 Chipotle Downtown West $8.23
nan 7.1.20 Break Time - Springfield $23.57
nan 7.3.20 State Farm - Agent $94.23
nan 7.3.20 T-Mobile $132.42
nan 7.4.20 Venmo -xj8382dzavvd $8.00
nan 7.6.20 Broadway McDonald's $11.73
nan 7.8.20 Break Time - Townsville $44.23
I would like to maintain a second dataframe which searches for keywords in the description and populates the "Category" column. Something as follows:
df2:
item category
mcdonald fast food
state farm insurance
break time gas
chipotle fast food
mobile cell phone
The idea here is that I would write lines of code to search for partial strings in df1['Description']
and populate df1['Category']
with the value in df2[category]
.
I'm sure there is a clean and pythonic way to handle this code, but below is the closest I can get. The erroneous result of the code below is that all rows of df1['Category']
containing a match are set to the last loop in df2 (eg in this case, all rows would be set to "cell phone").
for x in df2['item']:
for y in df2['category']:
df1['Category'] = np.where(
df1['Description'].str.lower().str.contains(x),
y,
df1['Category'])
Thanks for your help!
You can do this with map, Python's builtin difflib get close matches function, and a lambda expression. The difflib call returns a list of string matches and you can adjust the cutoff param for more or less sensitivity as needed.
import difflib
# you'll need to change both cutoff values here for the lambda to work correctly
df1['Category'] = df1['Description'].map(lambda x: difflib.get_close_matches(x, df2['item'], cutoff=0.3)[0] if len(difflib.get_close_matches(x, df2['item'], cutoff=0.3)) > 1 else 'no match')
print(df1)
Category Date Description Cost
0 chipotle 7.1.20 Chipotle Downtown West $8.23
1 break time 7.1.20 Break Time - Springfield $23.57
2 state farm 7.3.20 State Farm - Agent $94.23
3 mobile 7.3.20 T-Mobile $132.42
4 no match 7.4.20 Venmo -xj8382dzavvd $8.00
5 mcdonald 7.6.20 Broadway McDonald's $11.73
6 break time 7.8.20 Break Time - Townsville $44.23
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.