[英]Regex doesn't match, greediness
I try to match two parts in a string with a regex in PHP. 我尝试用PHP中的正则表达式将字符串的两个部分匹配。 There is a problem with the greediness, I think.
我认为贪婪存在问题。 I would like the first regex (see comment) to give me the first two captures, as the second regex, but still capture both strings.
我希望第一个正则表达式(请参见注释)能够为我提供前两个捕获,就像第二个正则表达式一样,但仍然捕获两个字符串。 What am I doing wrong?
我究竟做错了什么?
I'm trying to get +123
(if cd:
exists, as in first string) and 456
. 我正在尝试获取
+123
(如果cd:
存在,如第一个字符串所示)和456
。
<?php
$data[] = 'longstring start waste cd:+123yz456z longstring';
$data[] = 'longstring start waste +yz456z longstring';
$regexs[] = '/start[^z]*?(cd:([^y]+)y)?[^z]*z([^z]*)z/'; // first
$regexs[] = '/start[^z]*?(cd:([^y]+)y)[^z]*z([^z]*)z/'; // second
foreach ($regexs as $regex) {
foreach ($data as $string) {
if (preg_match($regex, $string, $match)) {
echo "Tried '$regex' on '$string' and got " . implode(',', array_split($match, 1));
echo "\n";
}
}
}
?>
Output is: 输出为:
Tried '/start[^z]*?(cd:([^y]+)y)?[^z]*z([^z]*)z/' on 'longstring start waste cd:+123yz456z longstring' and got ,,456
Tried '/start[^z]*?(cd:([^y]+)y)?[^z]*z([^z]*)z/' on 'longstring start waste +yz456z longstring' and got ,,456
Tried '/start[^z]*?(cd:([^y]+)y)[^z]*z([^z]*)z/' on 'longstring start waste cd:+123yz456z longstring' and got cd:+123y,+123,456
There is no fourth line since cd:
is not present in the second string. 由于第二个字符串中没有
cd:
所以没有第四行。
Expected output (since I'm no expert), where the first line differs from actual output: 预期输出(因为我不是专家),其中第一行与实际输出不同:
Tried '/start[^z]*?(cd:([^y]+)y)?[^z]*z([^z]*)z/' on 'longstring start waste cd:+123yz456z longstring' and got cd:+123y,+123,456
Tried '/start[^z]*?(cd:([^y]+)y)?[^z]*z([^z]*)z/' on 'longstring start waste +yz456z longstring' and got ,,456
Tried '/start[^z]*?(cd:([^y]+)y)[^z]*z([^z]*)z/' on 'longstring start waste cd:+123yz456z longstring' and got cd:+123y,+123,456
Okay, so you want to capture +123
if there is a cd:
, and always 456
? 好的,如果有
cd:
+123
捕获+123
,并且始终为456
? Here's how I would do it: 这是我的处理方式:
$data[] = 'longstring start waste cd:+123yz456z longstring';
$data[] = 'longstring start waste +yz456z longstring';
$regexs[] = '/start.+?(?:cd:(.+?)y)?.*?z(.+?)z/';
With the liberal use of non-greedy ( ?
) multipliers you can get it to do exactly what you want. 通过自由使用非贪婪(
?
)乘数,您可以使其完全按照您的要求进行。
Also note the (?:)
non-capture group. 另请注意
(?:)
非捕获组。 They are very useful. 它们非常有用。
EDIT Apparently that doesn't work, let's try a different approach, with an "either/or" group: 编辑显然这不起作用,让我们尝试使用“或/或”组的另一种方法:
$regexs[] = '/start.+?(?:cd:(.+?)yz(.+?)z|\+yz(.+?)z)/';
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.