简体   繁体   中英

Regular Expression with multiple matches - Python

I have searched the web to find a similar problem but couldn't.

Here is an address:

the fashion potential hq 116 w 23rd st ste 5 5th floor new york ny 10011

Using the following regex in python I tried to find the all possible main addresses in the above line:

re.findall(r'^(.*)(\b\d+\b)(.+)(\bst\b|\bste\b)(.*)$', 'the fashion potential hq 116 w 23rd st ste 5 5th floor new york ny 10011')

I get result as:

[('the fashion potential hq ', '116', ' w 23rd st ', 'ste', ' 5 5th floor new york ny 10011')] .

I also want the result to include this: ('the fash....', '116', 'w 23rd ', 'st', 'ste 5 5th....') . I expected findall would do the trick but didn't. Any help is greatly appreciated.

To make it clear what I want as output (or similar which includes all possibilities): [ ('the fashion potential hq ', '116', ' w 23rd ', 'st', 'ste 5 5th floor new york ny 10011'), ('the fashion potential hq ', '116', ' w 23rd st ', 'ste', ' 5 5th floor new york ny 10011')]

Online Python code

You need to run 2 regex expressions, one with lazy dot and another with a greedy dot.

First one is this :

^(.*?)(\b\d+\b)(.+)\b(ste|st|ave|blvd)\b\s*(.*)$

The second one with the use lazy dot matching pattern inside:

^(.*?)(\b\d+\b)(.+?)\b(ste|st|ave|blvd)\b\s*(.*)$
                ^^^    ^^^^^^^^^^^^^^^

See the regex demo

Output:

the fashion potential hq 
116
 w 23rd 
st
ste 5 5th floor new york ny 10011

Python sample code :

import re
p = re.compile(r'^(.*?)(\b\d+\b)(.+?)\b(ste|st|ave|blvd)\b\s*(.*)$')
p2 = re.compile(r'^(.*?)(\b\d+\b)(.+)\b(ste|st|ave|blvd)\b\s*(.*)$')
s = "the fashion potential hq 116 w 23rd st ste 5 5th floor new york ny 10011"
m = p.search(s)
if m:
    n = p2.search(s)
    if n:
        print([m.groups(), n.groups()])

Results:

[
   ('the fashion potential hq ', '116', ' w 23rd ', 'st', 'ste 5 5th floor new york ny 10011'), 
   ('the fashion potential hq ', '116', ' w 23rd st ', 'ste', '5 5th floor new york ny 10011')
 ]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM