简体   繁体   English

PHP 正则表达式非贪婪匹配在某些服务器上无法正常工作

[英]PHP Regex nongreedy match doesn't work correctly on some servers

I have the following minimal example that behaves differently on a local server (WAMP, PHP 7.3.7) and on a production server with PHP 7.3.27.我有以下最小示例,它在本地服务器(WAMP,PHP 7.3.7)和具有 PHP 7.3.27 的生产服务器上表现不同。 The result on the local server seems wrong to me because the lazy modifier is ignored.本地服务器上的结果对我来说似乎是错误的,因为惰性修饰符被忽略了。 The result is also in conflict with all the regex testers I have tried.结果也与我尝试过的所有正则表达式测试器相冲突。

Example code:示例代码:

<?php
header('Content-Type: text/plain; charset=utf-8');

$input = <<<EOT
John Smith
John Smith (123)
John Smith (123) (456)
EOT;


preg_match_all('/(.+?)(?:\s\((\d+)\))?$/m', $input, $matches_defines);
print_r($matches_defines);

?>

Result in local environment:结果在本地环境中:

Array
(
    [0] => Array
        (
            [0] => John Smith
            [1] => John Smith (123)
            [2] => John Smith (123) (456)
        )

    [1] => Array
        (
            [0] => John Smith
            [1] => John Smith (123)
            [2] => John Smith (123)
        )

    [2] => Array
        (
            [0] => 
            [1] => 
            [2] => 456
        )
)

Result in production environments:生产环境中的结果:

Array
(
    [0] => Array
        (
            [0] => John Smith
            [1] => John Smith (123)
            [2] => John Smith (123) (456)
        )

    [1] => Array
        (
            [0] => John Smith
            [1] => John Smith
            [2] => John Smith (123)
        )

    [2] => Array
        (
            [0] => 
            [1] => 123
            [2] => 456
        )
)

Can somebody tell me where this difference comes from and what adjustments can be made in the local environment to correct it?有人可以告诉我这种差异来自哪里以及可以在当地环境中进行哪些调整来纠正它?

Okay, answering my own question here: This is a Windows/Linux line ending issue.好的,在这里回答我自己的问题:这是一个 Windows/Linux 行尾问题。

The regex on my local server fails because in the text, there is a \r after every closing bracket.我本地服务器上的正则表达式失败,因为在文本中,每个右括号后都有一个\r So the closing bracket is not the end of a line, there is an additional character, \r before the end of the line ( $ ).所以右括号不是行尾,在行尾( $ )之前有一个额外的字符\r ( Source ) 来源

This can be fixed with a slightly modified regex: /(.+?)(?:\s\((\d+)\))?\s*$/m .这可以通过稍微修改的正则表达式来解决: /(.+?)(?:\s\((\d+)\))?\s*$/m Note the \s* at the end.请注意末尾的\s* This matches the \r at the end of the line if it's there.这与行尾的\r匹配(如果它在那里)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM