How do I check if dataframe column contains a string from another dataframe column and return adjacent cell in python pandas?

Question

I have 2 dataframes, one containing a columnn of strings (df = data) which I need to categorise, and the other containing possible categories and search terms (df = categories). I would like to add a column to the "data" dataframe which returns a category based on search terms. For example:

data:

**RepairName**
A/C is not cold
flat tyre is c
the tyre needs a repair on left side
the aircon is not cold

categories:

**Category**      **SearchTerm**
A/C               aircon
A/C               A/C
Tyre              repair
Tyre              flat

DESIRED RESULT data:

**RepairName**                        **Category**
A/C is not cold                         A/C
flat tyre is c                          Tyre
the tyre needs a repair on left side    Tyre
the aircon is not cold                  A/C

I have tried the following lambda function with apply. I am not sure if my column references are in the correct place:

data['Category'] = data['RepairName'].apply(lambda x: categories['Category'] if categories['SearchTerm'] in x else "")

data['Category'] = [categories['Category'] if categories['SearchTerm'] in data['RepairName'] else 0]

but I keep getting the error messge:

TypeError: 'in <string>' requires string as left operand, not Series

This provides true / false as to whether a category exists based on SearchTerm, however I have not been able to return the category associated with the Search Term:

data['containName']=data['RepairName'].str.contains('|'.join(categories['SearchTerm']),case=False)

And these both sometimes work, but not all the time (perhaps because some of my search terms are more than one word?)

data['Category'] = [
    next((c for c, k in categories.values if k in s), None) for s in data['RepairName']] 

d = dict(zip(categories['SearchTerm'], categories['Category']))
data['CategoryCheck'] = [next((d[y] for y in x.split() if y in d), None) for x in data['RepairName']]

Answer 1

We do str.findall then map

s=df.RepairName.str.findall('|'.join(cat.SearchTerm.tolist())).str[0].\
    map(cat.set_index('SearchTerm').Category)
0     A/C
1    Tyre
2    Tyre
3     A/C
Name: RepairName, dtype: object
df['Category']=s

Answer 2

This worked once I had ensured all my columns were lower case (I also removed hyphens and brackets as well for good measure):

print("All lowercase")
data = data.apply(lambda x: x.astype(str).str.lower())
categories = categories.apply(lambda x: x.astype(str).str.lower())

print("Remove double spacing")
data = data.replace('\s+', ' ', regex=True)

print('Remove hyphens')
data["RepairName"] = data["RepairName"].str.replace('-', '')

print('Remove brackets')
data["RepairName"] = data["RepairName"].str.replace('(', '')
data["RepairName"] = data["RepairName"].str.replace(')', '')

data['Category'] = [
    next((c for c, k in categories.values if k in s), None) for s in data['RepairName']]

How do I check if dataframe column contains a string from another dataframe column and return adjacent cell in python pandas?

Question

2 answers

solution1
0 2020-06-22 01:44:42

solution2
0 2020-06-22 09:02:49

How do I check if dataframe column contains a string from another dataframe column and return adjacent cell in python pandas?

Question

2 answers

solution1 0 2020-06-22 01:44:42

solution2 0 2020-06-22 09:02:49

solution1
0 2020-06-22 01:44:42

solution2
0 2020-06-22 09:02:49