Suppose I have a sentence:
Meet me at 201 South First St. at noon
And I want to get the address like this:
South First
What would be the appropriate Regex expression for it? I currently have this, but it is not working:
x = re.search(r"\d+\s?=([A-Z][a-z]*)\s(Rd.|Dr.|Ave.|St.)",searchstring)
Where searchstring is the sentence. The address is always preceded by 1 or more digits followed by a space and followed by either Rd. Dr. Ave. or St. The address also always starts with a capital letter.
The first group, the part where you try to match the address is [AZ][az]*
, it means one uppercase letter followed by any lowercase letters. Probably what you want is any uppercase or lowercase letter or space: [A-Za-z ]*
. Also note that the dots in the second group mean any character and not the literal .
, so you have to escape it. The solution would look like this:
>>> re.search(r'\d+\s?([A-Za-z ]*)\s+(Rd|Dr|Ave|St)\.', 'Meet me at 201 South First St. at noon')[1]
'South First'
Or just use .
to accept anything.
>>> re.search(r'\d+\s?(.*?)\s+(Rd|Dr|Ave|St)\.', 'Meet me at 201 South First St. at noon')[1]
'South First'
You may use
\d+\s*([A-Z].*?)\s+(?:Rd|Dr|Ave|St)\.
See the regex demo .
Details
\d+
- one or more digits \s*
- 0 or more whitespaces ([AZ].*?)
- capturing group #1: an uppercase ASCII letter and then any 0 or more chars other than line break chars as few as possible \s+
- 1+ whitespaces (?:Rd|Dr|Ave|St)
- Rd
, Dr
, Ave
or St
\.
- a dot See a Python demo :
m = re.search(r'\d+\s*([A-Z].*?)\s+(?:Rd|Dr|Ave|St)\.', text)
if m:
print(m.group(1))
Output: South First
.
Here is how:
import re
s = 'Meet me at 201 South First St. at noon'
print(re.findall('(?<=\d )[A-Z].*(?= d.|Dr.|Ave.|St.)', s)[0])
Output:
'South First'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.