[英]Python regex to find phrases contain exact words
I have a list of strings and wish to find exact phases. 我有一个字符串列表,希望找到确切的阶段。
So far my code finds the month and year only, but the whole phase including “- Recorded” is needed, like “March 2016 - Recorded”. 到目前为止,我的代码仅查找月份和年份,但是需要包括“-Recorded”的整个阶段,例如“ 2016年3月-Recorded”。
How can it add on the “- Recorded” to the regex? 如何将“-Recorded”添加到正则表达式上?
import re
texts = [
"Shawn Dookhit took annual leave in March 2016 - Recorded The report",
"Soondren Armon took medical leave in February 2017 - Recorded It was in",
"David Padachi took annual leave in May 2016 - Recorded It says",
"Jack Jagoo",
"Devendradutt Ramgolam took medical leave in August 2016 - Recorded Day back",
"Kate Dudhee",
"Vinaye Ramjuttun took annual leave in - Recorded Answering"
]
regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s')
for t in texts:
try:
m = regex.search(t)
print m.group()
except:
print "keyword's not found"
You got 2 named groups here: month
and year
which takes month and year from your strings. 您在此处有2个命名组: month
和year
,它们分别来自字符串。 To get - Recorded
into recorded
named group you can do this: 要获取- Recorded
到已recorded
命名组中,您可以执行以下操作:
regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s(?P<recorded>- Recorded)')
Or if you can just add - Recorded
to your regex without named group: 或者,如果您只可以添加- Recorded
到您的正则表达式中而没有命名组:
regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s- Recorded')
Or you can add named group other
with hyphen and one capitalized word: 或者,您可以添加带连字符和一个大写单词的命名组other
:
regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s(?P<other>- [A-Z][a-z]+)')
I think first or third option is preferable because you already got named groups. 我认为第一个或第三个选项更可取,因为您已经有了命名组。 Also i recommend you to use this web site http://pythex.org/ , it really helps to construct regex :). 我也建议您使用此网站http://pythex.org/ ,它确实有助于构建正则表达式:)。
Use a list comprehension with the corrected regex: 对更正的正则表达式使用列表理解:
regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s* - Recorded')
matches = [match.groups() for text in texts for match in [regex.search(text)] if match]
print(matches)
# [('March', '2016'), ('February', '2017'), ('May', '2016'), ('August', '2016')]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.