I have searched the web to find a similar problem but couldn't.
Here is an address:
the fashion potential hq 116 w 23rd st ste 5 5th floor new york ny 10011
Using the following regex in python I tried to find the all possible main addresses in the above line:
re.findall(r'^(.*)(\b\d+\b)(.+)(\bst\b|\bste\b)(.*)$', 'the fashion potential hq 116 w 23rd st ste 5 5th floor new york ny 10011')
I get result as:
[('the fashion potential hq ', '116', ' w 23rd st ', 'ste', ' 5 5th floor new york ny 10011')]
.
I also want the result to include this: ('the fash....', '116', 'w 23rd ', 'st', 'ste 5 5th....')
. I expected findall
would do the trick but didn't. Any help is greatly appreciated.
To make it clear what I want as output (or similar which includes all possibilities): [ ('the fashion potential hq ', '116', ' w 23rd ', 'st', 'ste 5 5th floor new york ny 10011'), ('the fashion potential hq ', '116', ' w 23rd st ', 'ste', ' 5 5th floor new york ny 10011')]
You need to run 2 regex expressions, one with lazy dot and another with a greedy dot.
First one is this :
^(.*?)(\b\d+\b)(.+)\b(ste|st|ave|blvd)\b\s*(.*)$
The second one with the use lazy dot matching pattern inside:
^(.*?)(\b\d+\b)(.+?)\b(ste|st|ave|blvd)\b\s*(.*)$
^^^ ^^^^^^^^^^^^^^^
See the regex demo
Output:
the fashion potential hq
116
w 23rd
st
ste 5 5th floor new york ny 10011
import re
p = re.compile(r'^(.*?)(\b\d+\b)(.+?)\b(ste|st|ave|blvd)\b\s*(.*)$')
p2 = re.compile(r'^(.*?)(\b\d+\b)(.+)\b(ste|st|ave|blvd)\b\s*(.*)$')
s = "the fashion potential hq 116 w 23rd st ste 5 5th floor new york ny 10011"
m = p.search(s)
if m:
n = p2.search(s)
if n:
print([m.groups(), n.groups()])
Results:
[
('the fashion potential hq ', '116', ' w 23rd ', 'st', 'ste 5 5th floor new york ny 10011'),
('the fashion potential hq ', '116', ' w 23rd st ', 'ste', '5 5th floor new york ny 10011')
]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.