简体   繁体   English

为什么我的正则表达式不能正确返回group(0)?

[英]Why my regex does not return group(0) properly?

I want to find the dates from a large number of files. 我想从大量文件中查找日期。 The date is on a single line, and is in the format of "21 September 2010" . 该日期为一行,格式为"21 September 2010" There is only one such date in each file. 每个文件中只有一个这样的日期。

The following codes return the month only, for example, "September" . 以下代码仅返回月份,例如"September" Why group(0) does not give me the whole thing like "21 September 2010" ? 为什么group(0)不能给我像"21 September 2010"这样的整体信息? What is missing here? 这里缺少什么? Thank you! 谢谢!

months = ("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")

pattern = r"^\d{2} +" + "|".join(months) + r" +\d{4}$"
match = re.search(pattern, text)
if match:
    fdate = match.group(0)

When you print your regex, you will see it looks like ^\\d{2} +January|February|March|April|May|June|July|August|September|October|November|December +\\d{4}$ . 当您打印正则表达式时,您会看到它看起来像^\\d{2} +January|February|March|April|May|June|July|August|September|October|November|December +\\d{4}$ When you apply it to 21 September 2010 , you will see that it matches September because the ^\\d{2} + can only be matched with January at the start of the string since the month alternatives are not grouped. 当您将其应用于21 September 2010 ,您会看到它与September匹配,因为^\\d{2} +在字符串的开头只能与January匹配,因为未对月份进行分组。

You need to group the month alternatives: 您需要对月份替代方案进行分组

pattern = r"^\d{{2}} +(?:{}) +\d{{4}}$".format("|".join(months))

See the Python demo : 参见Python演示

import re
text = "21 September 2010"
months = ("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December")
pattern = r"^\d{{2}} +(?:{}) +\d{{4}}$".format("|".join(months))
match = re.search(pattern, text)
if match:
    fdate = match.group(0)
    print(fdate) # => 21 September 2010

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM