简体   繁体   中英

Regex: there's a regex inside

I'm falling deeper into the regex's dark side. I need to parse this:

{{word(a|b|c)|word$1}}
{{word(s?)|word$1}}
{{w(a|b|c)ord(s?)|w$1ord$2}}

As you may have noticed, it is a search & replace scheme, containing regular expressions. The wikimedia engine does it very well, but I couldn't find how it does: right here .

I just need to get the first part, and the second part into two seperated variables. For instance:

preg_match(REGEX, "{{word(a|b|c)|word$1}}", $result) // Applying REGEX on this
echo $result[1] // word(a|b|c)
echo $result[2] // word$1

How would you do ? It's like regex in regex, I'm completely lost...

You could match the parts using something like:

{{((?:(?!}}).)+)\|([^|]+?)}}

Note that if you are allowing arbitrary PCRE regex then some very complex and slow patterns can be constructed, possibly allowing simple DoS attacks on your site.

It really depends on how deep the nesting can be, but you can just split it by | , taking care not to split it by any | within parentheses. Here's the easy way, I suppose:

$str = 'word(a|b|c)|word$1'; // Trim off the leading and trailing {{ and }}
$items = explode('|', $str);
$realItems = array();

for($i = 0; $i < count($items); $i++) {
    $realItem = $items[$i];
    while(substr_count($realItem, '(') > substr_count($realItem, ')')) {
        // Glue them together and skip one!
        $realItem .= '|' . $items[++$i];
    }

    $realItems[] = $realItem;
}

Now $realItems[] contains your 2-4 key values, which you can simply pass into preg_replace ; it'll do all the work for you.

It is actually not that hard.

The thing is, the replacement string will only ever contain an escaped | , ie \\| .

And for one of these very few occasions, .* will actually be useful here.

Do: preg_match("^{{(.*)\\|([^|]+(?:\\\\\\|[^|]*)*)}}$", $result); , this should do what you want.

The trick here is the second group: it is, again, the normal* (special normal*)* pattern, where normal is [^|] (anything but a pipe), and special is \\\\\\| (a backslash followed by a pipe).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM