在 python 中使用正则表达式在句子中搜索模式

Question

I want to capture the digits that follow a certain phrase and also the start and end index of the number of interest.我想捕获某个短语后面的数字以及感兴趣的数字的开始和结束索引。

Here is an example:这是一个例子：

text = The special code is 034567 in this particular case and not 98675

In this example, I am interested in capturing the number 034657 which comes after the phrase special code and also the start and end index of the the number 034657 .在此示例中，我有兴趣捕获短语special code之后的数字034657以及数字034657的开始和结束索引。

My code is:我的代码是：

p = re.compile('special code \s\w.\s (\d+)')
re.search(p, text)

But this does not match anything.但这不匹配任何东西。 Could you explain why and how I should correct it?你能解释一下为什么以及我应该如何纠正它吗？

Answer 1

Use re.findall with a capture group:将re.findall与捕获组一起使用：

text = "The special code is 034567 in this particular case and not 98675"
matches = re.findall(r'\bspecial code (?:\S+\s+)?(\d+)', text)
print(matches)

This prints:这打印：

['034567']

Answer 2

Your expression matches a space and any whitespace with \s pattern, then \w.您的表达式匹配空格和任何带有\s模式的空格，然后是\w. matches any word char and any character other than a line break char, and then again \s requires two whitespaces, any whitespace and a space.匹配任何单词 char 和除换行符以外的任何字符，然后\s再次需要两个空格，任何空格和一个空格。

You may simply match any 1+ whitespaces using \s+ between words, and to match any chunk of non-whitespaces, instead of \w.您可以简单地在单词之间使用\s+匹配任何 1+ 个空格，并匹配任何非空格块，而不是\w. , you may use \S+ . ，您可以使用\S+ 。

Use利用

import re
text = 'The special code is 034567 in this particular case and not 98675'
p = re.compile(r'special code\s+\S+\s+(\d+)')
m = p.search(text)
if m:
    print(m.group(1)) # 034567
    print(m.span(1))  # (20, 26)

See the Python demo and the regex demo .请参阅Python 演示和正则表达式演示。

在 python 中使用正则表达式在句子中搜索模式

问题描述

2 个解决方案

解决方案1
0 2020-05-20 09:07:00

解决方案2
0 已采纳 2020-05-20 09:19:14

在 python 中使用正则表达式在句子中搜索模式

问题描述

2 个解决方案

解决方案1 0 2020-05-20 09:07:00

解决方案2 0 已采纳 2020-05-20 09:19:14

解决方案1
0 2020-05-20 09:07:00

解决方案2
0 已采纳 2020-05-20 09:19:14