简体   繁体   English

重叠匹配的正则表达式问题

[英]Issues with regex for overlapping matches

In short, I'm trying to match the longest item furthest right in a string that fits this pattern: 简而言之,我正在尝试匹配最符合此模式的字符串中最长的项目:

[0-9][0-9\s]*(\.|,)\s*[0-9]\s*[0-9]

Consider, for example, the string "abc 1.5 28.00". 例如,考虑字符串“abc 1.5 28.00”。 I want to match "5 28.00". 我想要匹配“5 28.00”。

Using the pattern "as-is", like so 使用“原样”模式,就像这样

preg_match_all('/[0-9][0-9\s]*(\.|,)\s*[0-9]\s*[0-9]/', 'abc 1.5 28.00', $result);

we instead get the following matches: 我们改为获得以下匹配:

[0] => 1.5 2
[1] => 8.00

No "5 28.00" or "28.00" for that matter, for obvious reasons. 由于显而易见的原因,没有“5 28.00”或“28.00”。

I did some research and people suggested using positive lookahead for problems like this. 我做了一些研究,人们建议使用积极的前瞻来解决这类问题。 So I tried the following 所以我尝试了以下内容

preg_match_all('/(?=([0-9][0-9\s]*(\.|,)\s*[0-9]\s*[0-9]))/', 'abc 1.5 28.00', $result);

giving us these matches: 给我们这些比赛:

[0] => 1.5 2
[1] => 5 28.00
[2] => 28.00
[3] => 8.00

Now, "5 28.00" is in there which is good, but it can't be reliably identified as the correct match (eg you can't just traverse from the end looking for the longest match, because there could be a longer match that appeared earlier in the string). 现在,“5 28.00”在那里是好的,但它不能被可靠地识别为正确的匹配(例如,你不能只是从最后寻找最长的匹配,因为可能有更长的匹配,出现在字符串的早期)。 Ideally, I'd want those sub-matches at the end (indexes 2 and 3) to not be there so we can just grab the last index. 理想情况下,我希望最后的那些子匹配(索引2和3)不在那里,所以我们可以抓住最后一个索引。

Does anyone have ideas for how to accomplish exactly what I need in the simplest/best way possible? 有没有人有关于如何以最简单/最好的方式完成我所需要的想法? Let me know if I need to clarify anything as I know this stuff can get confusing, and many thanks in advance. 让我知道我是否需要澄清任何事情,因为我知道这些东西会让人感到困惑,并且提前多多感谢。

**Edit: some additional input/match examples **编辑:一些额外的输入/匹配示例

"abc 1.5 28.00999" => "5 28.00" (ie can't match end of string, $) “abc 1.5 28.00999”=>“5 28.00”(即无法匹配字符串结尾,$)

"abc 500000.05.00" => "5.00" “abc 500000.05.00”=>“5.00”

Your problem is easily fixed by ensuring you match on the end of the input string by adding a dollar sign: 通过添加美元符号确保您在输入字符串的末尾匹配,可以轻松解决您的问题:

preg_match_all('/[0-9][0-9\s]*(\.|,)\s*[0-9]\s*[0-9]$/', 
               'abc 1.5 28.00', $result);

Returns: 返回:

array (size=2)
  0 => 
    array (size=1)
      0 => string '5 28.00' (length=7)
  1 => 
    array (size=1)
      0 => string '.' (length=1)

Now I'm not entirely sure why you wrapped the dot in parentheses, but this output is correct for your question as far as I can see, and implements the "farthest to the right" requirement. 现在我不完全确定你为什么把圆点包在括号中,但是就我所见,这个输出对你的问题是正确的,并且实现了“最远到右边”的要求。

The nearest match I can get for you is the following 我可以得到的最近的匹配如下

((?:\d\s*)+[.,](?:\s*\d){2})(?:(?![.,](?:\s*\d){2}).)*$

And produces the following output (look at '1' in each case)... 并产生以下输出(在每种情况下都看'1')......

'abc 1.5 28.00999' => array (
  0 => '5 28.00999',
  1 => '5 28.00',
)
'abc 500000.05.00' => array (
  0 => '05.00',
  1 => '05.00',
)
'abc 111.5 8.0c 6' => array (
  0 => '111.5 8.0c 6',
  1 => '111.5 8',
)
'abc 500000.05.0a0' => array (
  0 => '500000.05.0a0',
  1 => '500000.05',
)
'abc 1.5 28.00999 6  0 0.6 6' => array (
  0 => '00999 6  0 0.6 6',
  1 => '00999 6  0 0.6 6',
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM