简体   繁体   中英

How to compare two python list of address with partial text match?

I have two python lists

    li = ['206 Brookwood Center Drive Suite 508, WMP, Birmingham, AL 35111',
      '340 Independence Drive, Homewood, AL 35209',
      '41 Doell Drive Southeast, Huntsville, AL 35801',
      '3 Mobile Circle, Suite 401, Mobile, AL 36607',
      '7209 Copperfield Drive, Montgomery, AL 36117']

mi = ['340 Independence Dr Homewood, AL 35209',
      '41 Doell Dr SE, Ste 24 Huntsville, AL 35801',
      '3 Mobile Cir, Ste 401 Mobile, AL 36607',
      '36 Saint Lukes Dr Montgomery, AL 36117',
      '91 Kanis Rd, Ste 300 Little Rock, AR 72205',
      '25 S Dobson Rd, Bldg J Chandler, AZ 85224']

I want to loop through li and see if a record does not exist in mi using some kind of partial text match, I tried startswith, in but because of differences like "Dr - Drive", "Suite-ste" this fails. Any suggestions? Would some kind of python regex work? The output should be '206 Brookwood Center Drive Suite 508, WMP, Birmingham, AL 35111' and 7209 Copperfield Drive, Montgomery, AL 36117

If you are doing this for fun, remember that addresses are read from the bottom up because getting the letter to at least the right building is the biggest step.

  1. city, state zip is the most important.
  2. street address is the second most important, along with apt#
  3. addressee is the last important item

The following two methods have a significant advantage over anything you might do with any kind of alias list for abbreviations, etc. That advantage is that they are based upon a database of all deliverable addresses against which to compare the "standardized" address:

If you are doing a one-off project and confidentiality is not an issue, you can use the US Post Office website for zip code lookup. It will return the standardized address as well. You can automate its use to some extent.

If you are going to do anywhere over 1,000 addresses on a recurring basis, get an address standardization software package, usually in the form of mailing software. $600US/year upwards.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM