简体   繁体   中英

Searching for a pattern in a sentence with regex in python

I want to capture the digits that follow a certain phrase and also the start and end index of the number of interest.

Here is an example:

text = The special code is 034567 in this particular case and not 98675

In this example, I am interested in capturing the number 034657 which comes after the phrase special code and also the start and end index of the the number 034657 .

My code is:

p = re.compile('special code \s\w.\s (\d+)')
re.search(p, text)

But this does not match anything. Could you explain why and how I should correct it?

Use re.findall with a capture group:

text = "The special code is 034567 in this particular case and not 98675"
matches = re.findall(r'\bspecial code (?:\S+\s+)?(\d+)', text)
print(matches)

This prints:

['034567']

Your expression matches a space and any whitespace with \s pattern, then \w. matches any word char and any character other than a line break char, and then again \s requires two whitespaces, any whitespace and a space.

You may simply match any 1+ whitespaces using \s+ between words, and to match any chunk of non-whitespaces, instead of \w. , you may use \S+ .

Use

import re
text = 'The special code is 034567 in this particular case and not 98675'
p = re.compile(r'special code\s+\S+\s+(\d+)')
m = p.search(text)
if m:
    print(m.group(1)) # 034567
    print(m.span(1))  # (20, 26)

See the Python demo and the regex demo .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM