[英]How do I split an address in python?
I need to split an address to three parts using python.我需要使用 python 将地址拆分为三个部分。 Given the following adresses:
鉴于以下地址:
103 Rur de Rennes 75006 Paris
57-59 avenue du Président Wilson 93210 Saint Denis la Plaine
I need to split into three parts(address, code, location), so the final result should be:我需要分成三部分(地址、代码、位置),所以最终结果应该是:
103 Rur de Rennes, 75006, Paris
57-59 avenue du Président Wilson, 93210, Saint Denis la Plaine
Is there a way I can achieve this?有没有办法实现这一目标?
re.split
will get you most of the way, if we can assume the code is a 5-digit number.如果我们可以假设代码是一个 5 位数字,
re.split
将为您提供大部分方法。
>>> re.split("(\d{5})", "103 Rur de Rennes 75006 Paris")
['103 Rur de Rennes ', '75006', ' Paris']
To trim the whitespace, you can just use the strip
method:要修剪空白,您可以只使用
strip
方法:
>>> [x.strip() for x in re.split("(\d{5})", "103 Rur de Rennes 75006 Paris")]
['103 Rur de Rennes', '75006', 'Paris']
Assuming the address starts with the street number and the next number will be the code, you cansplit
the address and look for a number from the right:假设地址以街道号码开头,下一个号码将是代码,您可以
split
地址并从右侧查找一个号码:
def sep_address(address):
parts = address.split()
for i in range(len(parts)-1, 0, -1):
if parts[i].isdigit():
return ' '.join(parts[:i]), parts[i], ' '.join(parts[i+1:])
print(sep_address("103 Rur de Rennes 75006 Paris"))
print(sep_address("57-59 avenue du Président Wilson 93210 Saint Denis la Plaine"))
Gives:给出:
('103 Rur de Rennes', '75006', 'Paris')
('57-59 avenue du Président Wilson', '93210', 'Saint Denis la Plaine')
import re def three_parts(addr): splitter = re.compile(r"^(?P<ADDRESS>\d+-?\d*[\w\s]+[^\d])(?P<CODE>\d+)\s+(?P<LOCATION>[\w\s]+)") retval = splitter.match(addr) if retval: # if you prefer a dictionary use retval.groupdict() return [retval.group("ADDRESS"), retval.group("CODE"), retval.group("LOCATION")] if __name__ == "__main__": addresses = [ "103 Rur de Rennes 75006 Paris", "57-59 avenue du Président Wilson 93210 Saint Denis la Plaine", "bogus address" ] for addr in addresses: print(addr) parts = three_parts(addr) print(parts) print("-"*40)
Output输出
103 Rur de Rennes 75006 Paris ['103 Rur de Rennes ', '75006', 'Paris'] ---------------------------------------- 57-59 avenue du Président Wilson 93210 Saint Denis la Plaine ['57-59 avenue du Président Wilson ', '93210', 'Saint Denis la Plaine'] ---------------------------------------- bogus address None ----------------------------------------
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.