简体   繁体   English

简单正则表达式中同一行上的多个匹配问题

[英]Problem with multiple matches on same line in simple regular expression

I'm having a very basic question about regular expressions. 我有一个关于正则表达式的非常基本的问题。 I am trying to match and replace URLs like these: 我正在尝试匹配并替换以下网址:

http://mydomain.com/image/13/imagetitle.html

For which I use the following expression: 我使用以下表达式:

/mydomain.com(.*)image\/(\d+)\/(.*).html/

This pattern works fine mostly, yet it does not work when multiple occurrences appear on the same line. 此模式主要工作正常,但当多次出现在同一行时它不起作用。 So this works: 这样可行:

This is my own image: http://mydomain.com/image/13/imagetitle.html

When including multiple occurrences across lines it works as well: 在跨行包含多个匹配项时,它也可以正常工作:

This is my own image: http://mydomain.com/image/13/imagetitle.html
Yet I recommend this one as well: image: http://mydomain.com/image/15/imagetitle2.html

Both occurrences match and are replaced correctly. 两个匹配项都匹配并正确替换。 However, it only replaces the first match when there are two occurrences on the same line: 但是,它只在同一行上出现两次时才替换第一个匹配:

This is my own image: http://mydomain.com/image/13/imagetitle.html, yet I recommend this one as well: image: http://mydomain.com/image/15/imagetitle2.html

How can I make sure all matches are replaced, regardless of new lines? 无论新线路如何,我如何确保所有比赛都被替换?

I didn't get the problem either. 我也没有遇到问题。 But just judging from the regex, your issue is likely to be the greediness. 但从正则表达式判断,你的问题可能就是贪婪。

(.*) matches as much as it can. (.*)尽可能匹配。 It will catch two URLs at once, if they are on the same line. 如果它们位于同一行,它将同时捕获两个URL。 Typically you therefore want to use (.*?) instead, or apply the ungreediness /U flag. 通常,您希望改为使用(.*?)或应用ungreediness /U标志。

But in your case I'd advise simply making the match more specific: 但在你的情况下,我建议简单地使比赛更具体:

/mydomain.com(\S*)image\/(\d+)\/(\S*).html/

Here the \\S will only match anything that isn't whitespace, because that's most certainly where URLs should be broken up. 这里的\\S只匹配任何不是空白的东西,因为这肯定是URL应该被分解的地方。 As alternative you could use a more specific character class like ([\\w/.?&#%=-]*) instead of .*? 作为替代方案,您可以使用更具体的字符类,如([\\w/.?&#%=-]*) .*? ([\\w/.?&#%=-]*)而不是.*? there. 那里。

Your pattern is working. 你的模式正在运作。 I had tested it by the foll code: 我用foll代码测试了它:

$data = "This1 is my own image: http://mydomain.com/image/13/imagetitle.html, yet I recommend this one as well: image: http://mydomain.com/image/15/imagetitle2.html
This2 is my own image: http://mydomain.com/image/13/imagetitle.html, yet I recommend this one as well: image: http://mydomain.com/image/15/imagetitle2.html
This3 is my own image: http://mydomain.com/image/13/imagetitle.html, yet I recommend this one as well: image: http://mydomain.com/image/15/imagetitle2.html
This4 is my own image: http://mydomain.com/image/13/imagetitle.html, yet I recommend this one as well: image: http://mydomain.com/image/15/imagetitle2.html
";
echo preg_replace('/mydomain.com(.*)image\/(\d+)\/(.*).html/', 'replaced one', $data);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM