[英]Regex: there's a regex inside
I'm falling deeper into the regex's dark side. 我正在深入正则表达式的阴暗面。 I need to parse this: 我需要解析这个:
{{word(a|b|c)|word$1}}
{{word(s?)|word$1}}
{{w(a|b|c)ord(s?)|w$1ord$2}}
As you may have noticed, it is a search & replace scheme, containing regular expressions. 您可能已经注意到,它是一个搜索和替换方案,其中包含正则表达式。 The wikimedia engine does it very well, but I couldn't find how it does: right here . Wikimedia引擎做得很好,但是我找不到它的作用: 就在这里 。
I just need to get the first part, and the second part into two seperated variables. 我只需要将第一部分和第二部分分成两个单独的变量即可。 For instance: 例如:
preg_match(REGEX, "{{word(a|b|c)|word$1}}", $result) // Applying REGEX on this
echo $result[1] // word(a|b|c)
echo $result[2] // word$1
How would you do ? 你会怎么做? It's like regex in regex, I'm completely lost... 就像正则表达式中的正则表达式,我完全迷失了...
You could match the parts using something like: 您可以使用以下内容匹配零件:
{{((?:(?!}}).)+)\|([^|]+?)}}
Note that if you are allowing arbitrary PCRE regex then some very complex and slow patterns can be constructed, possibly allowing simple DoS attacks on your site. 请注意,如果您允许任意PCRE正则表达式,那么可以构建一些非常复杂和缓慢的模式,可能允许在您的站点上进行简单的DoS攻击。
It really depends on how deep the nesting can be, but you can just split it by |
这实际上取决于嵌套的深度,但是您可以将其拆分为|
, taking care not to split it by any |
,小心不要分开任何|
within parentheses. 在括号内。 Here's the easy way, I suppose: 我想这是简单的方法:
$str = 'word(a|b|c)|word$1'; // Trim off the leading and trailing {{ and }}
$items = explode('|', $str);
$realItems = array();
for($i = 0; $i < count($items); $i++) {
$realItem = $items[$i];
while(substr_count($realItem, '(') > substr_count($realItem, ')')) {
// Glue them together and skip one!
$realItem .= '|' . $items[++$i];
}
$realItems[] = $realItem;
}
Now $realItems[]
contains your 2-4 key values, which you can simply pass into preg_replace
; 现在$realItems[]
包含2-4个键值,您可以将它们简单地传递到preg_replace
; it'll do all the work for you. 它会为你做所有的工作。
It is actually not that hard. 实际上并不难。
The thing is, the replacement string will only ever contain an escaped |
问题是,替换字符串将只包含一个逃脱|
, ie \\|
,即\\|
. 。
And for one of these very few occasions, .*
will actually be useful here. 对于其中一个场合, .*
实际上在这里很有用。
Do: preg_match("^{{(.*)\\|([^|]+(?:\\\\\\|[^|]*)*)}}$", $result);
执行: preg_match("^{{(.*)\\|([^|]+(?:\\\\\\|[^|]*)*)}}$", $result);
, this should do what you want. ,这应该做您想要的。
The trick here is the second group: it is, again, the normal* (special normal*)*
pattern, where normal
is [^|]
(anything but a pipe), and special
is \\\\\\|
这里的诀窍是第二组:它是normal* (special normal*)*
模式,其中normal
是[^|]
(除了管道之外的任何东西),而special
是\\\\\\|
(a backslash followed by a pipe). (反斜杠后跟管道)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.