Merge columns in Python/Pandas of Dataframe1 from Dataframe2 only if specific column contains at least one of the words of the other column

Question

Consider the Dataframes:

Employees:

Employee    City

Ernest      Tel Aviv
Merry       New York
Mason       Cairo

Clients:

Client  Words

Ernest  New vacuum Tel
Mason   Tel Aviv is so pretty
Merry   Halo! I live in the city York

I'm trying to merge columns in Pandas of Dataframe1 ( Employees ) from Dataframe2 ( Clients ) only if one of the words in column City (of Employees ) is contained in column Words of Clients .

The desired result should be as follows:

Employee    City        Words

Ernest      Tel Aviv    New vacuum Tel
Merry       New York    Halo! I live in the city York

Tried something like this

import pandas as pd

data1 = pd.read_csv('..........csv')
data2 = pd.read_csv('..........csv')

output = pd.merge(data1, data2, left_on=  ['City', 'column1'],
                   right_on= ['Words', 'column1'], 
                   how = 'inner')

But didn't really boiled down to something.

Any ideas ?

Answer 1

splits City and Words columns into a list then explode() to generate rows
you can now merge() to get required output

import pandas as pd
import io

data1 = pd.read_csv(
    io.StringIO("""Employee    City
Ernest      Tel Aviv
Merry       New York
Mason       Cairo"""),sep="\s\s+",engine="python",)

data2 = pd.read_csv(io.StringIO("""Client  Words
Ernest  New vacuum Tel
Mason   Tel Aviv is so pretty
Merry   Halo! I live in the city York"""),sep="\s\s+",engine="python",)

data1.assign(tokens=data1["City"].str.split(" ")).explode("tokens").merge(
    data2.assign(tokens=data2["Words"].str.split(" ")).explode("tokens"),
    left_on=["Employee", "tokens"],
    right_on=["Client", "tokens"],
).drop(columns="tokens").drop_duplicates()

	Employee	City	Client	Words
0	Ernest	Tel Aviv	Ernest	New vacuum Tel
1	Merry	New York	Merry	Halo! I live in the city York

Answer 2

Complicated join;

#Extract last word in Client's Words

 Clients['joinword']=Clients['Words'].str.extract("(\w+$)")

#Make it a search word separated by | for or

 s='|'.join(Clients['joinword'].to_list())

#Find s in Employees City

Employees['joinword']=Employees['City'].str.findall(f'{s}').str[0]

#Now merge as follows

 pd.merge(Employees,Clients, right_on=['Client','joinword'],left_on=['Employee','joinword'], how='inner')

Employee      City joinword  Client                          Words
0   Ernest  Tel Aviv      Tel  Ernest                 New vacuum Tel
1    Merry  New York     York   Merry  Halo! I live in the city York

Merge columns in Python/Pandas of Dataframe1 from Dataframe2 only if specific column contains at least one of the words of the other column

Question

2 answers

solution1
1 ACCPTED 2021-07-14 09:03:00

solution2
1 2021-07-14 09:15:07

Merge columns in Python/Pandas of Dataframe1 from Dataframe2 only if specific column contains at least one of the words of the other column

Question

2 answers

solution1 1 ACCPTED 2021-07-14 09:03:00

solution2 1 2021-07-14 09:15:07

solution1
1 ACCPTED 2021-07-14 09:03:00

solution2
1 2021-07-14 09:15:07