Python 中带有前瞻的正则表达式不匹配

Question

I have composed a regex pattern aiming to capture one date and one number from a sentence.我编写了一个正则表达式模式，旨在从句子中捕获一个日期和一个数字。 But it does not.但事实并非如此。

My code is:我的代码是：

txt = 'Την 02/12/2013 καταχωρήθηκε στο Γενικό Εμπορικό Μητρώο της Υπηρεσίας Γ.Ε.ΜΗ. του Επιμελητηρίου Βοιωτίας, με κωδικόαριθμό καταχώρισης Κ.Α.Κ.: 110035'

p = re.compile(r'''Την\s? # matches Την with a possible space afterwards

               (?P<KEK_date>\d{2}/\d{2}/\d{4}) #matches a date of the given format and captures it with a named group
               
               \.+ # Allow for an arbitrary sequence of characters 
               
               (?=(κωδικ.\s?αριθμ.\s?καταχ.ριση.)|(κ\.?α\.?κ\.?:?\s*)) # defines two lookaheads, either of which suffices
               
               (?P<KEK_number>\d+) # captures a sequence of numbers''', re.I|re.VERBOSE)

p.findall(txt)

I would expect to return a list with two elements: '02/12/2013' and '110035' , but instead, it returns an empty list.我希望返回一个包含两个元素的列表： '02/12/2013'和'110035' ，但相反，它返回一个空列表。

Answer 1

Issues:问题：

\.+ matches one or more dots, you need to use .+ (no escaping) \.+匹配一个或多个点，你需要使用.+ （不能转义）
(?=(κωδικ.\s?αριθμ.\s?καταχ.ριση.)|(κ\.?α\.?κ\.?:?\s*))(?P<KEK_number>\d+) will always prevent any match since the positive lookahead requires some text that is not 1 or more digits. (?=(κωδικ.\s?αριθμ.\s?καταχ.ριση.)|(κ\.?α\.?κ\.?:?\s*))(?P<KEK_number>\d+)总是阻止任何匹配，因为正向前瞻需要一些不是 1 位或更多位数字的文本。 You need to convert the lookahead to a consuming pattern.您需要将前瞻转换为消耗模式。

I suggest fixing your pattern as我建议将您的模式固定为

p = re.compile(r'''Την\s? # matches Την with a possible space afterwards
(?P<KEK_date>\d{2}/\d{2}/\d{4}) #matches a date of the given format and captures it with a named group
.+ # Allow for an arbitrary sequence of characters 
(?:κωδικ.\s?αριθμ.\s?καταχ.ριση.|κ\.?α\.κ\.:?)\s+ # defines two lookaheads, either of which suffices
(?P<KEK_number>\d+) # captures a sequence of numbers''', re.I | re.X)

See the regex demo查看正则表达式演示

Details细节

Την\s? - Την string and an optional whitespace - Την字符串和一个可选的空格
(?P<KEK_date>\d{2}/\d{2}/\d{4}) - Group "KEK_date": a date pattern, 2 digits, / , 2 digits, / and 4 digits (?P<KEK_date>\d{2}/\d{2}/\d{4}) - 组“KEK_date”：日期模式，2 位， / ，2 位， /和 4 位
.+ - 1 or more chars other than line break chars as many as possible .+ - 除了换行符之外，尽可能多的 1 个或多个字符
(?:κωδικ.\s?αριθμ.\s?καταχ.ριση.|κ\.?α\.κ\.:?) - either of (?:κωδικ.\s?αριθμ.\s?καταχ.ριση.|κ\.?α\.κ\.:?) - 两者之一
- κωδικ.\s?αριθμ.\s?καταχ.ριση. - κωδικ , any char, an optional whitespace, αριθμ , any one char, an optional whitespace, καταχ , any 1 char, ριση and any 1 char (but line break char) - κωδικ ，任何字符，可选空格， αριθμ ，任何一个字符，可选空格， καταχ ，任何 1 个字符， ριση和任何 1 个字符（但换行符）
- | - or - 或者
- κ\.?α\.κ\.:? - κ , an optional . κ ，一个可选的. , α , an optional . , α , 一个可选的. , κ a . , κ a . and then an optional :然后是可选的:
\s+ - 1+ whitespaces \s+ - 1+ 个空格
(?P<KEK_number>\d+) - Group "KEK_number": 1+ digits (?P<KEK_number>\d+) - 组“KEK_number”：1+ 位

See a Python demo :请参阅Python 演示：

import re
txt = 'Την 02/12/2013 καταχωρήθηκε στο Γενικό Εμπορικό Μητρώο της Υπηρεσίας Γ.Ε.ΜΗ. του Επιμελητηρίου Βοιωτίας, με κωδικόαριθμό καταχώρισης Κ.Α.Κ.: 110035'
p = re.compile(r'''Την\s? # matches Την with a possible space afterwards
(?P<KEK_date>\d{2}/\d{2}/\d{4}) #matches a date of the given format and captures it with a named group
.+ # Allow for an arbitrary sequence of characters 
(?:κωδικ.\s?αριθμ.\s?καταχ.ριση.|κ\.?α\.κ\.:?)\s+ # defines two lookaheads, either of which suffices
(?P<KEK_number>\d+) # captures a sequence of numbers''', re.I | re.X)
print(p.findall(txt)) # => [('02/12/2013', '110035')]

Python 中带有前瞻的正则表达式不匹配

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-06-24 15:37:48

Python 中带有前瞻的正则表达式不匹配

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-06-24 15:37:48

解决方案1
1 已采纳 2020-06-24 15:37:48