简体   繁体   中英

Merge two dataframe if one string column is contained in another column in Pandas

I need to merge the following df1 and df2 , based on condition: if address in df1 contains state in df2 .

df1:

                                                                           address  \
0      Cecilia Chapman 711-2880 Nulla St. Mankato Mississippi 96522 (257) 563-7401   
1  Iris Watson P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 20620 (372) 587-2335   
2    Celeste Slater 606-3727 Ullamcorper. Street Roseville NH 11523 (786) 713-8616   
3            Theodore Lowe Ap #867-859 Sit Rd. Azusa New York 39531 (793) 151-6230   
4                 Calista Wise 7292 Dictum Av. San Antonio MI 47096 (492) 709-6392   

   quantity  price  
0         2     20  
1         3     13  
2         5     23  
3         3     32  
4         5     45  

df2:

   id        state
0   1  Mississippi
1   2     Nebraska
2   3     New York

My expected output will like this. How could I do that? Thank you.

                                                                           address  \
0      Cecilia Chapman 711-2880 Nulla St. Mankato Mississippi 96522 (257) 563-7401   
1  Iris Watson P.O. Box 283 8562 Fusce Rd. Frederick Nebraska 20620 (372) 587-2335   
2    Celeste Slater 606-3727 Ullamcorper. Street Roseville NH 11523 (786) 713-8616   
3            Theodore Lowe Ap #867-859 Sit Rd. Azusa New York 39531 (793) 151-6230   
4                 Calista Wise 7292 Dictum Av. San Antonio MI 47096 (492) 709-6392   

   quantity  price   id        state  
0         2     20  1.0  Mississippi  
1         3     13  2.0     Nebraska  
2         5     23  NaN          NaN  
3         3     32  3.0     New York  
4         5     45  NaN          NaN  

Update: the output of pat = '|'.join(r"\\b{}\\b".format(x) for x in df2['state']); print(df1['address'].str.extract('('+ pat + ')', expand=False)) pat = '|'.join(r"\\b{}\\b".format(x) for x in df2['state']); print(df1['address'].str.extract('('+ pat + ')', expand=False))

      0    1    2    3    4    5    6    7    8    9  ...    40   41   42  \
0    NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN ...   NaN  NaN  NaN   
1    NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN ...   NaN  NaN  NaN   
2    NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN ...   NaN  NaN  NaN   
3    NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN ...   NaN  NaN  NaN    
..   ...  ...  ...  ...  ...  ...  ...  ...  ...  ... ...   ...  ...  ...  
158  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN ...   NaN  NaN  NaN   
159  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN ...   NaN  NaN  NaN  

You can extract all possible states by Series.str.extract with \\b\\b for words boundaries to new column and then merge with left join:

pat = '|'.join(r"\b{}\b".format(x) for x in df2['state'])
df1['state']= df1['address'].str.extract('('+ pat + ')', expand=False)
print (df1)
                                             address  quantity  price  \
0  Cecilia Chapman 711-2880 Nulla St. Mankato Mis...         2     20   
1  Iris Watson P.O. Box 283 8562 Fusce Rd. Freder...         3     13   
2  Celeste Slater 606-3727 Ullamcorper. Street Ro...         5     23   
3  Theodore Lowe Ap #867-859 Sit Rd. Azusa New Yo...         3     32   
4  Calista Wise 7292 Dictum Av. San Antonio MI 47...         5     45   

         state  
0  Mississippi  
1     Nebraska  
2          NaN  
3     New York  
4          NaN  

df = df1.merge(df2, on='state', how='left')
print (df)
                                             address  quantity  price  \
0  Cecilia Chapman 711-2880 Nulla St. Mankato Mis...         2     20   
1  Iris Watson P.O. Box 283 8562 Fusce Rd. Freder...         3     13   
2  Celeste Slater 606-3727 Ullamcorper. Street Ro...         5     23   
3  Theodore Lowe Ap #867-859 Sit Rd. Azusa New Yo...         3     32   
4  Calista Wise 7292 Dictum Av. San Antonio MI 47...         5     45   

         state   id  
0  Mississippi  1.0  
1     Nebraska  2.0  
2          NaN  NaN  
3     New York  3.0  
4          NaN  NaN  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM