简体   繁体   中英

Mapping/Zipping between two Pandas data frames with a partial string match

I have two dataframes of size roughly 1,000,000 rows each. Both share a common 'Address' column which I am using to join the dataframes. Using this join, I wish to move information, which I shall call 'details', from dataframe1 to dataframe2.

df2.details = df2.Address.map(dict(zip(df1.Address,df1.details)))

However, the address column does not exhibit entire commonality. I tried cleaning as best I could, but still can only move roughly 40% of the data across. Is there a way to modify my above code to allow for a partial match? I'm totally stumped on this one.

Data is quite simply as described. Two small dataframes. Fabricated sample data below:

df1 
Address                                    Details
Apt 15 A, Long Street, Fake town, US       A   


df2
Address                                    Details
15A, Long Street, Fake town, U.S.              

First, I would recommend performing the join operation and identifying the rows in each data frame that do not have a perfect match. Once you have identified these rows, exclude the others and proceed with the following suggestions:

  • One approach is to parse the addresses and attempt to standardize them. You might try using the usaddress module to standardize your addresses.

  • You could also try the approaches recommended in answer to this question , although they may take some tweaking for your case. It's hard to say without multiple examples of the partial string matches.

  • Another approach would be to use the Google Maps API (or Bing or MapQuest) for address standardization, though with over million rows per data frame you will far out strip the free API calls/day and would need to pay for the service.

  • A final suggestion is to use the fuzzywuzzy module for fuzzy (approximate) string matching.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM