简体   繁体   English

Python正则表达式来查找包含确切单词的短语

[英]Python regex to find phrases contain exact words

I have a list of strings and wish to find exact phases. 我有一个字符串列表,希望找到确切的阶段。

So far my code finds the month and year only, but the whole phase including “- Recorded” is needed, like “March 2016 - Recorded”. 到目前为止,我的代码仅查找月份和年份,但是需要包括“-Recorded”的整个阶段,例如“ 2016年3月-Recorded”。

How can it add on the “- Recorded” to the regex? 如何将“-Recorded”添加到正则表达式上?

import re


texts = [

"Shawn Dookhit took annual leave in March 2016 - Recorded The report",
"Soondren Armon took medical leave in February 2017 - Recorded It was in",
"David Padachi took annual leave in May 2016 - Recorded It says",
"Jack Jagoo",
"Devendradutt Ramgolam took medical leave in August 2016 - Recorded Day back",
"Kate Dudhee",
"Vinaye Ramjuttun took annual leave in  - Recorded Answering"

]

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s')     

for t in texts:
    try:
        m = regex.search(t)
        print m.group()
    except:
        print "keyword's not found"

You got 2 named groups here: month and year which takes month and year from your strings. 您在此处有2个命名组: monthyear ,它们分别来自字符串。 To get - Recorded into recorded named group you can do this: 要获取- Recorded到已recorded命名组中,您可以执行以下操作:

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s(?P<recorded>- Recorded)')

Or if you can just add - Recorded to your regex without named group: 或者,如果您只可以添加- Recorded到您的正则表达式中而没有命名组:

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s- Recorded')

Or you can add named group other with hyphen and one capitalized word: 或者,您可以添加带连字符和一个大写单词的命名组other

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s(?P<other>- [A-Z][a-z]+)') 

I think first or third option is preferable because you already got named groups. 我认为第一个或第三个选项更可取,因为您已经有了命名组。 Also i recommend you to use this web site http://pythex.org/ , it really helps to construct regex :). 我也建议您使用此网站http://pythex.org/ ,它确实有助于构建正则表达式:)。

Use a list comprehension with the corrected regex: 对更正的正则表达式使用列表理解:

regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s* - Recorded')

matches = [match.groups() for text in texts for match in [regex.search(text)] if match]
print(matches)
# [('March', '2016'), ('February', '2017'), ('May', '2016'), ('August', '2016')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM