简体   繁体   中英

Regex find pattern between blocks

I'm building a templating engine in PHP (Django like) that replaces everything between {{ }} with its related data. Right now I'm able to do that, but I'm facing a situation that requires a replacement only between blocks, such as {% for y in x %} loop blocks and ignores all brackets that are not in between them.

I was able to somewhat get some results in this regex101 example but only getting the first {{ }} of each block. What I want to do is to match all {{ }} in each block, excluding the ones that are outside.

For learning purposes (very good!) you have several possibilities:

  1. A multi-step approach (easier to comprehend and to maintain):

  2. An overall regex solution (more complicated & possibly more "fancy")


Ad 1)

Match the blocks with the following expression (see a demo on regex101.com ):

{{\s*(.+?)\s*}}

And look for pairs of {{...}} in each block with:

<?php
$data = <<<DATA
{% for user in users %}
   Hello, {{ user.name }}, you are {{ user.age }} {{ user.name }}
ssssssssssssssssssssss {{ user.name }}
sdsddddddddddddddddddddddddddddd
{% endfor %}

{% for dog in dogs %}
   Your dog is {{ dog.age }} and likes {{ dog.food }}.
{% endfor %}
wwww
{{ user.name }}
DATA;

$block = '~
            {%\ for.*?%}
            (?s:.+?)
            {%\ endfor.*?%}
            ~x';

$variable = '~{{\s*(.+?)\s*}}~';

if (preg_match_all($block, $data, $matches)) {
    foreach ($matches as $match) {
        if (preg_match_all($variable, $match[0], $variables, PREG_SET_ORDER)) {
            print_r($variables);
        }

    }
}
?>

In PHP , this could be:

 <?php $data = <<<DATA {% for user in users %} Hello, {{ user.name }}, you are {{ user.age }} {{ user.name }} ssssssssssssssssssssss {{ user.name }} sdsddddddddddddddddddddddddddddd {% endfor %} {% for dog in dogs %} Your dog is {{ dog.age }} and likes {{ dog.food }}. {% endfor %} wwww {{ user.name }} DATA; $block = '~ {%\\ for.*?%} (?s:.+?) {%\\ endfor.*?%} ~x'; $variable = '~{{\\s*(.+?)\\s*}}~'; if (preg_match_all($block, $data, $matches)) { foreach ($matches as $match) { if (preg_match_all($variable, $match[0], $variables, PREG_SET_ORDER)) { print_r($variables); } } } ?> 


Ad 2)

Match all of the variables in question with an overall expression. Here, you'll need \\G (which matches at the position of the last match) and some lookaheads (see a demo for this one at regex101.com as well ):

 (?:{%\\ for.+?%} | \\G(?!\\A) ) (?s:(?!{%).)*?\\K {{\\s*(?P<variable>.+?)\\s*}} 

Now let's demystify this expression:

 (?:{%\\ for.+?%} | \\G(?!\\A) ) 

Here, we want to either match {%\\ for.+?%} (we need the \\ as we are in verbose mode) or at the position of the last match with \\G . Now, the truth is, \\G either matches at the position of the last match or the very beginning of the string. We do not want the latter, hence the neg. lookahead (?!\\A) .

The next part

 (?s:(?!{%).)*?\\K 

kind of does a "fast forward" to the interesting parts in question.

Broken down, this says

 (?s: # open a non-capturing group, enabling the DOTALL mode (?!{%). # neg. lookahead, do not overrun {% (the closing tag) )*? # lazy quantifier for the non-capturing group \\K # make the engine "forget" everything to the left 

Now, the rest is easy:

 {{\\s*(?P<variable>.+?)\\s*}} 

It's basically, the same construct as for ad 1).

Again, in PHP , this could be:

 <?php $regex = '~ (?:{%\\ for.+?%} | \\G(?!\\A) ) (?s:(?!{%).)*?\\K {{\\s*(?P<variable>.+?)\\s*}} ~x'; if (preg_match_all($regex, $data, $variables)) { print_r($variables[1]); } ?> 


With all that said, it's generally a good idea to actually learn more complex patterns but not to reinvent the wheel on the other hand - there's always someone smarter than you & me who has probably taken into account several edge cases, etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM