Python Regex-根据开头和结尾捕获句子

Question

I'm fairly new to python, but I'm attempting to write a program that will capture a sentence out of a string, based of the beginning and ending of the sentence. 我是python的新手，但是我正在尝试编写一个程序，该程序将根据句子的开头和结尾从字符串中捕获句子。

For example if my string was 例如，如果我的字符串是

description = "11:26:16 ENTRY 'Insert Imaginative Description of a person' 11:29:17 EXIT 'Insert The Description of the Same Person'"

I know how to do the regex to detect the date stamp and the word entry. 我知道如何做正则表达式来检测日期戳和单词输入。 I'd use: 我会用：

re.search(r'\d{2}:\d{2}:\d{2} ENTRY', description)

Which would of course tell me that there was one entry at that position, but how would I make the regex capture the date stamp, entry and the following sentence, but leave out the EXIT log? 哪个当然可以告诉我该位置有一个条目，但是我将如何使正则表达式捕获日期戳，条目和后面的句子，而忽略EXIT日志？

Answer 1

You may try this. 您可以尝试一下。

re.search(r'\b(\d{2}:\d{2}:\d{2}(?:\.\d{3})?) ENTRY', description)

Use re.findall if you want to do a global match since re.search would return only the first match. 如果要进行全局匹配，请使用re.findall ，因为re.search仅返回第一个匹配项。

Example: 例：

>>> import re
>>> description = "11:26:16 ENTRY 'Insert Imaginative Description of a person' 11:29:17 EXIT 'Insert The Description of the Same Person'"
>>> re.search(r'\b(\d{2}:\d{2}:\d{2}(?:\.\d{3})?) ENTRY', description).group(1)
'11:26:16'

To get also the log after the ENTRY . 在ENTRY之后也要获取日志。

>>> re.search(r"\b(\d{2}:\d{2}:\d{2}(?:\.\d{3})?) ENTRY '([^']*)'", description).group(1)
'11:26:16'
>>> re.search(r"\b(\d{2}:\d{2}:\d{2}(?:\.\d{3})?) ENTRY '([^']*)'", description).group(2)
'Insert Imaginative Description of a person'
>>> re.search(r"\b(\d{2}:\d{2}:\d{2}(?:\.\d{3})?) ENTRY '([^']*)'", description).group()
"11:26:16 ENTRY 'Insert Imaginative Description of a person'"

Answer 2

Add in brackets ( ) around the pattern you want to match to get groups returned for them, plus your pattern doesn't actually match your example - the pattern expects a . 在您要匹配的模式周围添加方括号（），以获取为其返回的组，此外，您的模式实际上与您的示例不匹配-模式期望为。 and three digits. 和三位数。 You can make these optional like this: 您可以像下面这样使这些可选：

match = re.search(r'(\d{2}:\d{2}:\d{2}(\.\d{3})?) ENTRY', description)
if match:
    print match.group(1)

To capture the sentence, extend the pattern like this: 要捕获句子，请扩展模式，如下所示：

match = re.search(r'(\d{2}:\d{2}:\d{2}(\.\d{3})?) ENTRY \'([^\']+)\'', description)
if match:
    print match.group(1), match.group(3)

Note the sentence is in group 3 because the option three digits are group 2. Output is: 请注意，该句子位于组3中，因为选项三位数字位于组2中。输出为：

11:26:16 Insert Imaginative Description of a person

Because the pattern must match the ' ' around the sentence, these are preceded with backslash . 因为该模式必须与句子周围的''相匹配，所以在它们之前加反斜杠。 Another way of doing this would be to use " " around the whole pattern, in whcih case the ' do not need backslash before them. 这样做的另一种方法是在整个模式周围使用“”，在这种情况下，'之前不需要反斜杠。

Python Regex-根据开头和结尾捕获句子

问题描述

2 个解决方案

解决方案1
0 已采纳 2015-09-12 15:03:08

解决方案2
0 2015-09-12 15:06:59

Python Regex-根据开头和结尾捕获句子

问题描述

2 个解决方案

解决方案1 0 已采纳 2015-09-12 15:03:08

解决方案2 0 2015-09-12 15:06:59

解决方案1
0 已采纳 2015-09-12 15:03:08

解决方案2
0 2015-09-12 15:06:59