简体   繁体   中英

Extract House Number and Street Name from string using Python Regex

I'm new to Regex and am trying to use it to parse apart addresses into House Number and Street.

Example: 123 Main St --> ['123', 'Main St']

It gets slightly complicated by the fact that some of my street strings will have hyphenated street addresses, in which case I want to take the first number before the hyphen.

Example: 123-127 Main St --> ['123', 'Main St']

Lastly, I need to be able to handle street names that start with a number.

Most complicated example being: 123-127 3rd Ave --> ['123', '3rd Ave']

So far I've been able to extract the street number, including in the hyphenated scenario, but I'm unsure how to extract the street name which comes after matching the street number pattern.

MyString='123-127 Main St'
StreetNum=digit=re.findall('(^\d+)', MyString)

Thanks for the help!

Am also editing the question to point out that a dash is not the only character that can separate streets with two numbers. There are three total situations that come up in the data:

1) 123-127 5th St

2) 123 1/2 5th St

3) 123 & 125 5th St

In all 3 of these situations the result should be 123 5th St.

I assumed that the address part must be at the last and it has exactly two words.

>>> s = '123-127 Main St'
>>> re.findall(r'^\d+|\S+ +\S+$', s)
['123', 'Main St']
>>> re.findall(r'^\d+|\S+ +\S+$', "123-127 3rd Ave")
['123', '3rd Ave']

\\S+ matches one or more non-space characters.

OR

Through re.split function,

>>> s = '123-127 Main St'
>>> re.split(r'(?<=\d)(?:-\d+)?\s+', s)
['123', 'Main St']
>>> re.split(r'(?<=\d)(?:-\d+)?\s+', "123 Main St")
['123', 'Main St']
>>> re.split(r'(?<=\d)(?:-\d+)?\s+', "123-127 3rd Ave")
['123', '3rd Ave']

希望这是您要寻找的:

(\d+).*?\s+(.+)
(\d+)(?:-\d+(?=\s))?\s(.*)

Captures the first number, skips a dash and the next number (if present), then captures everything after the space.

>>> re.match(r'(\d+)(?:-\d+(?=\s))?\s(.*)', '123-127 3rd Ave').groups()
('123', '3rd Ave')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM