简体   繁体   English

正则表达式:里面有一个正则表达式

[英]Regex: there's a regex inside

I'm falling deeper into the regex's dark side. 我正在深入正则表达式的阴暗面。 I need to parse this: 我需要解析这个:

{{word(a|b|c)|word$1}}
{{word(s?)|word$1}}
{{w(a|b|c)ord(s?)|w$1ord$2}}

As you may have noticed, it is a search & replace scheme, containing regular expressions. 您可能已经注意到,它是一个搜索和替换方案,其中包含正则表达式。 The wikimedia engine does it very well, but I couldn't find how it does: right here . Wikimedia引擎做得很好,但是我找不到它的作用: 就在这里

I just need to get the first part, and the second part into two seperated variables. 我只需要将第一部分和第二部分分成两个单独的变量即可。 For instance: 例如:

preg_match(REGEX, "{{word(a|b|c)|word$1}}", $result) // Applying REGEX on this
echo $result[1] // word(a|b|c)
echo $result[2] // word$1

How would you do ? 你会怎么做? It's like regex in regex, I'm completely lost... 就像正则表达式中的正则表达式,我完全迷失了...

You could match the parts using something like: 您可以使用以下内容匹配零件:

{{((?:(?!}}).)+)\|([^|]+?)}}

Note that if you are allowing arbitrary PCRE regex then some very complex and slow patterns can be constructed, possibly allowing simple DoS attacks on your site. 请注意,如果您允许任意PCRE正则表达式,那么可以构建一些非常复杂和缓慢的模式,可能允许在您的站点上进行简单的DoS攻击。

It really depends on how deep the nesting can be, but you can just split it by | 这实际上取决于嵌套的深度,但是您可以将其拆分为| , taking care not to split it by any | ,小心不要分开任何| within parentheses. 在括号内。 Here's the easy way, I suppose: 我想这是简单的方法:

$str = 'word(a|b|c)|word$1'; // Trim off the leading and trailing {{ and }}
$items = explode('|', $str);
$realItems = array();

for($i = 0; $i < count($items); $i++) {
    $realItem = $items[$i];
    while(substr_count($realItem, '(') > substr_count($realItem, ')')) {
        // Glue them together and skip one!
        $realItem .= '|' . $items[++$i];
    }

    $realItems[] = $realItem;
}

Now $realItems[] contains your 2-4 key values, which you can simply pass into preg_replace ; 现在$realItems[]包含2-4个键值,您可以将它们简单地传递到preg_replace ; it'll do all the work for you. 它会为你做所有的工作。

It is actually not that hard. 实际上并不难。

The thing is, the replacement string will only ever contain an escaped | 问题是,替换字符串将只包含一个逃脱| , ie \\| ,即\\| .

And for one of these very few occasions, .* will actually be useful here. 对于其中一个场合, .*实际上在这里很有用。

Do: preg_match("^{{(.*)\\|([^|]+(?:\\\\\\|[^|]*)*)}}$", $result); 执行: preg_match("^{{(.*)\\|([^|]+(?:\\\\\\|[^|]*)*)}}$", $result); , this should do what you want. ,这应该做您想要的。

The trick here is the second group: it is, again, the normal* (special normal*)* pattern, where normal is [^|] (anything but a pipe), and special is \\\\\\| 这里的诀窍是第二组:它是normal* (special normal*)*模式,其中normal[^|] (除了管道之外的任何东西),而special\\\\\\| (a backslash followed by a pipe). (反斜杠后跟管道)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM