简体   繁体   中英

Merge columns in Python/Pandas of Dataframe1 from Dataframe2 only if specific column contains at least one of the words of the other column

Consider the Dataframes:

Employees:

Employee    City

Ernest      Tel Aviv
Merry       New York
Mason       Cairo

Clients:

Client  Words

Ernest  New vacuum Tel
Mason   Tel Aviv is so pretty
Merry   Halo! I live in the city York

I'm trying to merge columns in Pandas of Dataframe1 ( Employees ) from Dataframe2 ( Clients ) only if one of the words in column City (of Employees ) is contained in column Words of Clients .

The desired result should be as follows:

Employee    City        Words

Ernest      Tel Aviv    New vacuum Tel
Merry       New York    Halo! I live in the city York

Tried something like this

import pandas as pd

data1 = pd.read_csv('..........csv')
data2 = pd.read_csv('..........csv')

output = pd.merge(data1, data2, left_on=  ['City', 'column1'],
                   right_on= ['Words', 'column1'], 
                   how = 'inner')
  

But didn't really boiled down to something.

Any ideas ?

  • splits City and Words columns into a list then explode() to generate rows
  • you can now merge() to get required output
import pandas as pd
import io

data1 = pd.read_csv(
    io.StringIO("""Employee    City
Ernest      Tel Aviv
Merry       New York
Mason       Cairo"""),sep="\s\s+",engine="python",)

data2 = pd.read_csv(io.StringIO("""Client  Words
Ernest  New vacuum Tel
Mason   Tel Aviv is so pretty
Merry   Halo! I live in the city York"""),sep="\s\s+",engine="python",)

data1.assign(tokens=data1["City"].str.split(" ")).explode("tokens").merge(
    data2.assign(tokens=data2["Words"].str.split(" ")).explode("tokens"),
    left_on=["Employee", "tokens"],
    right_on=["Client", "tokens"],
).drop(columns="tokens").drop_duplicates()
Employee City Client Words
0 Ernest Tel Aviv Ernest New vacuum Tel
1 Merry New York Merry Halo! I live in the city York

Complicated join;

#Extract last word in Client's Words

 Clients['joinword']=Clients['Words'].str.extract("(\w+$)")

#Make it a search word separated by | for or

 s='|'.join(Clients['joinword'].to_list())

#Find s in Employees City

Employees['joinword']=Employees['City'].str.findall(f'{s}').str[0]

#Now merge as follows

 pd.merge(Employees,Clients, right_on=['Client','joinword'],left_on=['Employee','joinword'], how='inner')

Employee      City joinword  Client                          Words
0   Ernest  Tel Aviv      Tel  Ernest                 New vacuum Tel
1    Merry  New York     York   Merry  Halo! I live in the city York

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM