简体   繁体   English

模板引擎的正则表达式?

[英]Regular expression for template engine?

I'm learning about regular expressions and want to write a templating engine in PHP. 我正在学习正则表达式,并想用PHP编写模板引擎。

Consider the following "template": 考虑以下“模板”:

<!DOCTYPE html>
<html lang="{{print("{hey}")}}" dir="{{$dir}}">
<head>
    <meta charset="{{$charset}}">
</head>
<body>
    {{$body}}
    {{}}
</body>
</html>

I managed to create a regex that will find anything except for {{}}. 我设法创建了一个正则表达式,它将找到除{{}}之外的所有内容。

Here's my regex: 这是我的正则表达式:

{{[^}]+([^{])*}}

There's just one problem. 只有一个问题。 How do I allow the literal { and } to be used within {{}} tags? 如何允许文字{}{{}}标记中使用?

It will not find {{print("{hey}")}} . 它不会找到{{print("{hey}")}}

Thanks in advance. 提前致谢。

You can just use "." 您可以只使用“。” instead of the character classes. 而不是角色类。 But you then have to make use of non-greedy quantifiers: 但是您必须使用非贪婪的量词:

\{\{(.+?)\}\}

The quantifier "+?" 量词“ +?” means it will consume the least necessary number of characters. 表示它将消耗最少数量的字符。

Consider this example: 考虑以下示例:

<table>
  <tr>
    <td>{{print("{first name}")}}</td><td>{{print("{last name}")}}</td>
  </tr>
</table>

With a greedy quantifier (+ or *), you'd only get one result, because it sees the first {{ and then the .+ consumes as many characters as it can as long as the pattern is matched: 使用贪婪的量词(+或*),您只会得到一个结果,因为它会看到第一个{{ ,然后.+会消耗尽可能多的字符,只要匹配模式即可:

{{print("{first name}")}}</td><td>{{print("{last name}")}}

With a non-greedy one (+? or *?) you'll get the two as separate results: 使用非贪婪的一个(+?或*?),您将得到两个单独的结果:

{{print("{first name}")}}
{{print("{last name}")}}

This is a pattern to match the content inside double curly brackets: 这是匹配双大括号内内容的模式:

$pattern = <<<'LOD'
~
(?(DEFINE)
    (?<quoted>
        ' (?: [^'\\]+ | (?:\\.)+ )++ ' |
        " (?: [^"\\]+ | (?:\\.)+ )++ "
    )
    (?<nested>
        { (?: [^"'{}]+ | \g<quoted> | \g<nested> )*+ }
    )
)

{{
    (?<content>
        (?: 
            [^"'{}]+
          | \g<quoted>  
          | \g<nested>

        )*+
    )
}}
~xs
LOD;

Compact version: 精简版:

$pattern = '~{{((?>[^"\'{}]+|((["\'])(?:[^"\'\\\]+|(?:\\.)+|(?:(?!\3)["\'])+)++\3)|({(?:[^"\'{}]+|\g<2>|(?4))*+}))*+)}}~s';

The content is in the first capturing group, but you can use the named capture 'content' with the detailed version. 内容在第一个捕获组中,但是您可以将命名的捕获'content'与详细版本一起使用。

If this pattern is longer, it allows all that you want inside quoted parts including escaped quotes, and is faster than a simple lazy quantifier in much cases. 如果此模式较长,则它允许在引号中包含所有您要的内容,包括转义的引号,并且在许多情况下比简单的惰性量词要快。 Nested curly brackets are allowed too, you can write {{ doThat(){ doThis(){ }}}} without problems. 也可以使用嵌套的大括号,您可以{{ doThat(){ doThis(){ }}}}编写{{ doThat(){ doThis(){ }}}}

The subpattern for quotes can be written like this too, avoiding to repeat the same thing for single and double quotes (I use it in compact version) 引号的子模式也可以这样编写,避免对单引号和双引号重复相同的内容(我在紧凑版本中使用它)

(["'])             # the quote type is captured (single or double)
(?:                # open a group (for the various alternatives)
    [^"'\\]+       # all characters that are not a quote or a backslash
  |                # OR
    (?:\\.)+       # escaped characters (with the \s modifier)
  |                #
    (?!\g{-1})["'] # a quote that is not the captured quote
)++                # repeat one or more times
\g{-1}             # the captured quote (-1 refers to the last capturing group)

Notice: a backslash must be written \\\\ in nowdoc syntax but \\\\\\ or \\\\\\\\ inside single quotes. 注意:必须使用nowdoc语法将反斜杠写为\\\\ ,但在单引号内使用\\\\\\\\\\\\\\

Explanations for the detailed pattern: 详细模式说明:

The pattern is divided in two parts: 模式分为两部分:

  • the definitions where i define named subpatterns 我定义命名子模式的定义
  • the whole pattern itself 整个模式本身

The definition section is useful to avoid to repeat always the same subpattern several times in the main pattern or to make it more clear. 定义部分可用于避免在主模式中多次重复同一子模式,或者使其更加清晰。 You can define subpatterns that you will use later in this space: 您可以定义子模式,稍后将在此空间中使用:
(?(DEFINE)....)

This section contains 2 named subpatterns: 本节包含2个命名子模式:

  • quoted : that contains the description of quoted parts 带引号 :包含引号部分的说明
  • nested : that describes nested curly brackets parts nested :描述嵌套的花括号部分

detail of nested 嵌套的细节

(?<nested>           # open the named group "nested"
    {                # literal {
 ## what can contain curly brackets? ##
    (?>              # open an atomic* group
        [^"'{}]+     # all characters one or more times, except "'{}
      |              # OR
        \g<quoted>   # quoted content, to avoid curly brackets inside quoted parts
                     # (I call the subpattern I have defined before, instead of rewrite all)
      | \g<nested>   # OR curly parts. This is a recursion
    )*+              # repeat the atomic group zero or more times (possessive *)
    }                # literal }
)                    # close the named group

(* more informations about atomic groups and possessive quantifiers ) (*有关原子团所有格修饰语的更多信息)

But all of this are only definitions, the pattern begins really with: {{ Then I open a named capture group ( content ) and I describe what can be found inside, (nothing new here). 但是,所有这些仅是定义,该模式实际上始于: {{然后,我打开一个命名的捕获组( content ),并描述可以在其中找到的内容(这里没有新内容)。

I use to modifiers, x and s . 我用xs来修饰。 x activates the verbose mode that allows to put freely spaces in the pattern (useful to indent). x激活详细模式,该模式允许在模式中自由放置空格(对缩进很有用)。 s is the singleline mode. s是单行模式。 In this mode, the dot can match newlines (it can't by default). 在此模式下,点可以匹配换行符(默认情况下不能匹配)。 I use this mode because there is a dot in the subpattern quoted . 我之所以使用这种模式,是因为在子模式中用quoted了一个点。

使用{{(.*?)}}使您的正则表达式不那么贪婪。

I figured it out. 我想到了。 Don't ask me how. 不要问我如何。

{{[^{}]*("[^"]*"\))?(}})

This will match pretty much anything.. like for example: 这几乎可以匹配任何东西..例如:

{{print("{{}}}{{{}}}}{}}{}{hey}}{}}}{}7")}}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM