简体   繁体   English

Python Regex-根据开头和结尾捕获句子

[英]Python Regex - Capturing a sentence based on the beginning and ending

I'm fairly new to python, but I'm attempting to write a program that will capture a sentence out of a string, based of the beginning and ending of the sentence. 我是python的新手,但是我正在尝试编写一个程序,该程序将根据句子的开头和结尾从字符串中捕获句子。

For example if my string was 例如,如果我的字符串是

description = "11:26:16 ENTRY 'Insert Imaginative Description of a person' 11:29:17 EXIT 'Insert The Description of the Same Person'"

I know how to do the regex to detect the date stamp and the word entry. 我知道如何做正则表达式来检测日期戳和单词输入。 I'd use: 我会用:

re.search(r'\d{2}:\d{2}:\d{2} ENTRY', description)

Which would of course tell me that there was one entry at that position, but how would I make the regex capture the date stamp, entry and the following sentence, but leave out the EXIT log? 哪个当然可以告诉我该位置有一个条目,但是我将如何使正则表达式捕获日期戳,条目和后面的句子,而忽略EXIT日志?

You may try this. 您可以尝试一下。

re.search(r'\b(\d{2}:\d{2}:\d{2}(?:\.\d{3})?) ENTRY', description)

Use re.findall if you want to do a global match since re.search would return only the first match. 如果要进行全局匹配,请使用re.findall ,因为re.search仅返回第一个匹配项。

Example: 例:

>>> import re
>>> description = "11:26:16 ENTRY 'Insert Imaginative Description of a person' 11:29:17 EXIT 'Insert The Description of the Same Person'"
>>> re.search(r'\b(\d{2}:\d{2}:\d{2}(?:\.\d{3})?) ENTRY', description).group(1)
'11:26:16'

To get also the log after the ENTRY . ENTRY之后也要获取日志。

>>> re.search(r"\b(\d{2}:\d{2}:\d{2}(?:\.\d{3})?) ENTRY '([^']*)'", description).group(1)
'11:26:16'
>>> re.search(r"\b(\d{2}:\d{2}:\d{2}(?:\.\d{3})?) ENTRY '([^']*)'", description).group(2)
'Insert Imaginative Description of a person'
>>> re.search(r"\b(\d{2}:\d{2}:\d{2}(?:\.\d{3})?) ENTRY '([^']*)'", description).group()
"11:26:16 ENTRY 'Insert Imaginative Description of a person'"

Add in brackets ( ) around the pattern you want to match to get groups returned for them, plus your pattern doesn't actually match your example - the pattern expects a . 在您要匹配的模式周围添加方括号(),以获取为其返回的组,此外,您的模式实际上与您的示例不匹配-模式期望为。 and three digits. 和三位数。 You can make these optional like this: 您可以像下面这样使这些可选:

match = re.search(r'(\d{2}:\d{2}:\d{2}(\.\d{3})?) ENTRY', description)
if match:
    print match.group(1)

To capture the sentence, extend the pattern like this: 要捕获句子,请扩展模式,如下所示:

match = re.search(r'(\d{2}:\d{2}:\d{2}(\.\d{3})?) ENTRY \'([^\']+)\'', description)
if match:
    print match.group(1), match.group(3)

Note the sentence is in group 3 because the option three digits are group 2. Output is: 请注意,该句子位于组3中,因为选项三位数字位于组2中。输出为:

11:26:16 Insert Imaginative Description of a person

Because the pattern must match the ' ' around the sentence, these are preceded with backslash . 因为该模式必须与句子周围的''相匹配,所以在它们之前加反斜杠。 Another way of doing this would be to use " " around the whole pattern, in whcih case the ' do not need backslash before them. 这样做的另一种方法是在整个模式周围使用“”,在这种情况下,'之前不需要反斜杠。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM