简体   繁体   English

正则表达式中的贪婪问题

[英]Issues with greediness in regex

In PHP, I'm matching the text here http://siba.thenetworksolution.it/allegati/H3018500D7FDDE9ACA05671F49F4F3746A69DAF96.1329514.pdf.txt with the following regex: 在PHP中,我将http://siba.thenetworksolution.it/allegati/H3018500D7FDDE9ACA05671F49F4F3746A69DAF96.1329514.pdf.txt中的文本与以下正则表达式进行匹配:

preg_match('#(.*(?s))(particella |particelle |p\.|part\.|p |part |mappale |mapp\.|mapp |n\.|\*)\s*(\d+[\d /\p{Pd}]*)($|.{0,20}(?s)(graffati|particella |particelle |p\.|.*part\.|p |part |mappale |mapp\.|mapp |n\.|subalterno |subalterni |sub\.|s\.|sub |s |\bcat\b|\bcategoria\b|\brendita\b|\bvani\b|\bconsistenza\b|\bR\.C\.\b))#i', $txt, $matches, PREG_OFFSET_CAPTURE, $offset)

with offset = 1155 (that is the offset of the word "foglio" in the text). offset = 1155 (即文本中“ foglio”一词的偏移量)。

I expected them to match the 454 (that is just after the offset) but it matches 57/1998 instead (that is many rows after). 我期望它们匹配454 (仅在偏移量之后),但它匹配57/1998 (即之后的许多行)。

After some tests on regex101.com I discovered the issue is the carriage return between the prefix particella and 454 , but I expected the \\s to match line feeds. 在regex101.com上进行一些测试后,我发现问题是前缀particella454之间的回车,但是我希望\\s与换行符匹配。

How I can correct the greediness so the regex will match the 454 ? 我如何纠正贪婪,使正则表达式与454匹配?

Solved. 解决了。 There was a space after particella in the second group. 第二组中的particella后有一个空格。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM