简体   繁体   中英

Regex for python: how do I extract a string between words?

Suppose I have a sentence:

Meet me at 201 South First St. at noon

And I want to get the address like this:

South First

What would be the appropriate Regex expression for it? I currently have this, but it is not working:

 x = re.search(r"\d+\s?=([A-Z][a-z]*)\s(Rd.|Dr.|Ave.|St.)",searchstring)

Where searchstring is the sentence. The address is always preceded by 1 or more digits followed by a space and followed by either Rd. Dr. Ave. or St. The address also always starts with a capital letter.

The first group, the part where you try to match the address is [AZ][az]* , it means one uppercase letter followed by any lowercase letters. Probably what you want is any uppercase or lowercase letter or space: [A-Za-z ]* . Also note that the dots in the second group mean any character and not the literal . , so you have to escape it. The solution would look like this:

>>> re.search(r'\d+\s?([A-Za-z ]*)\s+(Rd|Dr|Ave|St)\.', 'Meet me at 201 South First St. at noon')[1]
'South First'

Or just use . to accept anything.

>>> re.search(r'\d+\s?(.*?)\s+(Rd|Dr|Ave|St)\.', 'Meet me at 201 South First St. at noon')[1]
'South First'

You may use

\d+\s*([A-Z].*?)\s+(?:Rd|Dr|Ave|St)\.

See the regex demo .

Details

  • \d+ - one or more digits
  • \s* - 0 or more whitespaces
  • ([AZ].*?) - capturing group #1: an uppercase ASCII letter and then any 0 or more chars other than line break chars as few as possible
  • \s+ - 1+ whitespaces
  • (?:Rd|Dr|Ave|St) - Rd , Dr , Ave or St
  • \. - a dot

See a Python demo :

m = re.search(r'\d+\s*([A-Z].*?)\s+(?:Rd|Dr|Ave|St)\.', text)
if m:
    print(m.group(1)) 

Output: South First .

Here is how:

import re
s = 'Meet me at 201 South First St. at noon'
print(re.findall('(?<=\d )[A-Z].*(?= d.|Dr.|Ave.|St.)', s)[0])

Output:

'South First'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM