简体   繁体   English

如何在python中拆分地址?

[英]How do I split an address in python?

I need to split an address to three parts using python.我需要使用 python 将地址拆分为三个部分。 Given the following adresses:鉴于以下地址:

103 Rur de Rennes 75006 Paris
57-59 avenue du Président Wilson 93210 Saint Denis la Plaine

I need to split into three parts(address, code, location), so the final result should be:我需要分成三部分(地址、代码、位置),所以最终结果应该是:

103 Rur de Rennes,  75006,  Paris
57-59 avenue du Président Wilson,  93210,  Saint Denis la Plaine

Is there a way I can achieve this?有没有办法实现这一目标?

re.split will get you most of the way, if we can assume the code is a 5-digit number.如果我们可以假设代码是一个 5 位数字, re.split将为您提供大部分方法。

>>> re.split("(\d{5})", "103 Rur de Rennes 75006 Paris")
['103 Rur de Rennes ', '75006', ' Paris']

To trim the whitespace, you can just use the strip method:要修剪空白,您可以只使用strip方法:

>>> [x.strip() for x in re.split("(\d{5})", "103 Rur de Rennes 75006 Paris")]
['103 Rur de Rennes', '75006', 'Paris']

Assuming the address starts with the street number and the next number will be the code, you cansplit the address and look for a number from the right:假设地址以街道号码开头,下一个号码将是代码,您可以split地址并从右侧查找一个号码:

def sep_address(address):
    parts = address.split()
    for i in range(len(parts)-1, 0, -1):
        if parts[i].isdigit():
            return ' '.join(parts[:i]), parts[i], ' '.join(parts[i+1:])

print(sep_address("103 Rur de Rennes 75006 Paris"))
print(sep_address("57-59 avenue du Président Wilson 93210 Saint Denis la Plaine"))

Gives:给出:

('103 Rur de Rennes', '75006', 'Paris')
('57-59 avenue du Président Wilson', '93210', 'Saint Denis la Plaine')
import re

def three_parts(addr):
    splitter = re.compile(r"^(?P&LTADDRESS>\d+-?\d*[\w\s]+[^\d])(?P&LTCODE>\d+)\s+(?P&LTLOCATION>[\w\s]+)")
    retval = splitter.match(addr)
    if retval:
        # if you prefer a dictionary use retval.groupdict()
        return [retval.group("ADDRESS"), retval.group("CODE"), retval.group("LOCATION")]

if __name__ == "__main__":
    addresses = [
        "103 Rur de Rennes 75006 Paris",
        "57-59 avenue du Président Wilson 93210 Saint Denis la Plaine",
        "bogus address"
    ]    
    for addr in addresses:
        print(addr)
        parts = three_parts(addr)
        print(parts)
        print("-"*40)

Output输出

103 Rur de Rennes 75006 Paris
['103 Rur de Rennes ', '75006', 'Paris']
----------------------------------------
57-59 avenue du Président Wilson 93210 Saint Denis la Plaine
['57-59 avenue du Président Wilson ', '93210', 'Saint Denis la Plaine']
----------------------------------------
bogus address
None
----------------------------------------

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM