简体   繁体   English

在 python 中使用正则表达式在句子中搜索模式

[英]Searching for a pattern in a sentence with regex in python

I want to capture the digits that follow a certain phrase and also the start and end index of the number of interest.我想捕获某个短语后面的数字以及感兴趣的数字的开始和结束索引。

Here is an example:这是一个例子:

text = The special code is 034567 in this particular case and not 98675

In this example, I am interested in capturing the number 034657 which comes after the phrase special code and also the start and end index of the the number 034657 .在此示例中,我有兴趣捕获短语special code之后的数字034657以及数字034657的开始和结束索引。

My code is:我的代码是:

p = re.compile('special code \s\w.\s (\d+)')
re.search(p, text)

But this does not match anything.但这不匹配任何东西。 Could you explain why and how I should correct it?你能解释一下为什么以及我应该如何纠正它吗?

Use re.findall with a capture group:re.findall与捕获组一起使用:

text = "The special code is 034567 in this particular case and not 98675"
matches = re.findall(r'\bspecial code (?:\S+\s+)?(\d+)', text)
print(matches)

This prints:这打印:

['034567']

Your expression matches a space and any whitespace with \s pattern, then \w.您的表达式匹配空格和任何带有\s模式的空格,然后是\w. matches any word char and any character other than a line break char, and then again \s requires two whitespaces, any whitespace and a space.匹配任何单词 char 和除换行符以外的任何字符,然后\s再次需要两个空格,任何空格和一个空格。

You may simply match any 1+ whitespaces using \s+ between words, and to match any chunk of non-whitespaces, instead of \w.您可以简单地在单词之间使用\s+匹配任何 1+ 个空格,并匹配任何非空格块,而不是\w. , you may use \S+ . ,您可以使用\S+

Use利用

import re
text = 'The special code is 034567 in this particular case and not 98675'
p = re.compile(r'special code\s+\S+\s+(\d+)')
m = p.search(text)
if m:
    print(m.group(1)) # 034567
    print(m.span(1))  # (20, 26)

See the Python demo and the regex demo .请参阅Python 演示正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM