簡體   English   中英

使用非貪婪的正則表達式捕獲部分文本

[英]Capturing portions of text with non-greedy regex

我想使用re.findall提取分配給每個PCR的值。

>>> z
'PCR-09: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-11: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-12: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-13: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-14: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-15: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-16: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\n

>>> print z
PCR-09: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-11: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-12: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-13: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-14: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-15: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-16: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

最初,我嘗試過這種方法,但是有人可以指出所使用的正則表達式有什么問題嗎?

>>> re.search('PCR-09:(.*?)', z).groups()
('',)

非貪婪的expr (.*?)是否應該匹配所有字符,直到找到換行符?

使用稍微修改的正則表達式,我得到了預期的結果:

>>> re.search('PCR-09:(.*?)\s\r\n', z).groups()
(' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00',)

同樣,這行不通:

>>> re.findall(r'(PCR-\d+):(.*?)', z)
[('PCR-09', ''), ('PCR-10', ''), ('PCR-11', ''), ('PCR-12', ''), ('PCR-13', ''), ('PCR-14', ''), ('PCR-15', ''), ('PCR-16', ''), 

但這確實是:

>>> re.findall(r'(PCR-\d+):(.*?)\s\r\n', z,re.DOTALL)
[('PCR-09', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-10', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-11', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-12', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-13', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-14', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-15', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-16', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'),

希望有人可以解釋我的方法出了什么問題。

謝謝

r'PCR-09:(.*?)'與您期望的不符的原因是,非貪婪的正則表達式一旦有效就會停止。

因此(.*?)可以匹配'' ,因此正則表達式立即停止。

相反, r'(PCR-\\d+):(.*?)\\s\\r\\n'是非貪婪的,但是由於它需要找到`\\ s \\ r \\ n',因此將強制展開為工作。

我建議使用貪婪的正則表達式,其中僅包含您希望找到的字符: r'(PCR-\\d+):([0-9 ]*)'

模式PCR-09:(.*?)告訴Python在PCR-09:之后非貪婪地匹配零個或多個字符。 因此,它正是這樣做並匹配零個字符。

您需要讓Regex保持貪婪 ,以便將所有內容匹配到換行符:

>>> re.search('PCR-09:(.*)', z).groups()
(' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r',)
>>>

請注意,您的PCR-09:(.*?)\\s\\r\\n模式可以正常工作,因為它告訴Python在PCR-09:之后獲得零個或多個字符, 直到 \\s\\r\\n 為止 換句話說,獲得它們之間的一切。

嘗試使用: split

[ x.split(':') for x in z.split('\r\n')]

輸出:

[['PCR-09', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-10', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-11', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-12', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-13', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-14', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-15', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['PCR-16', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '], ['']]

使用正則表達式

re.findall('(PCR-\d+)(.*)',z)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM