[英]Nested PCRE Regex Issue
I have a custom template engine. 我有一个自定义模板引擎。
It catch this : 它抓住了这个:
@function(argument1 argument2 ...)
@get(param:name)
@get(param:@get(sub:name))
And this : 和这个 :
@function(argument1 argument2 ...)
Some stuff @with(nested:tag)
@foreach(arguments as value)
More stuff : @get(value)
@/foreach
@function(other:args)
Same function name (nested)
@/function
@/function
With this pattern (PCRE / PHP) : 使用这种模式(PCRE / PHP):
#
@ ([\w]+) \(
( (?: [^@\)] | (?R) )+ )
\)
(?:
( (?> (?-2) ) )
@/\\1
)?
#xms
This regex catch almost all results. 这个正则表达式捕获几乎所有结果。 But when i have more nested (or not) tags, then it catch nothing.
但是,当我有更多的嵌套(或没有)标签时,它什么也收不到。 For example , when i do 2 nested
@foreach(var:name) ... @/foreach
then the regex will fail depending of the tag content spaces
. 例如 ,当我做2个嵌套的
@foreach(var:name) ... @/foreach
,根据标签内容spaces
,正则表达式将失败。
Using named subpatterns is sometimes more clear. 使用命名子模式有时会更清晰。 I suggest you to use this:
我建议您使用此:
~
@(?<com>\w+) # command name
\s* # possible white characters before args
(?: \( (?<args>[^)]*) \) )?+ # eventual parameters
(?:
(?<content>(?:[^@]+|(?R))*+) # content (maybe empty)
@/\g{com} # close the command
)?+ # optional
~
If you need to allow commands inside arguments, you can replace (?<args>[^)]*)
with (?<args>(?:[^@)]+|(?=@)(?R))*+)
如果需要允许在参数内使用命令,则可以将
(?<args>[^)]*)
替换为(?<args>(?:[^@)]+|(?=@)(?R))*+)
But a better way when you are trying to describe a language is to use the (?(DEFINE)...)
syntax to describe elements first, before the main pattern, example: 但是,当您尝试描述一种语言时,一种更好的方法是使用
(?(DEFINE)...)
语法首先在主要模式之前描述元素,例如:
$pattern = <<<'EOD'
~
(?(DEFINE)
(?<command_name> \w+ )
(?<inline_command> @ \g<command_name> \s* \g<params>? )
(?<multil_command> @ (\g<command_name>) \s* \g<params>? \g<content> @/ \g{-1} )
(?<command> \g<multil_command> | \g<inline_command> )
(?<other> [^@()]+ )
(?<param> \g<other> | \g<command> )
(?<params> \( \s* \g<param> (?: \s+ \g<param> )* \s* \) )
(?<content> (?: \g<other> | \g<command> )* )
)
# main pattern
\g<command>
~x
EOD;
With this kind of syntax, if you want to extract elements at the ground level, you only need to change the main pattern to: @(?<com> \\g<command_name> ) \\s* (?<args>\\g<params> )? (?: (?<con> \\g<content> ) @/ \\g{com} )?
使用这种语法,如果要在底层提取元素,则只需将主模式更改为:
@(?<com> \\g<command_name> ) \\s* (?<args>\\g<params> )? (?: (?<con> \\g<content> ) @/ \\g{com} )?
@(?<com> \\g<command_name> ) \\s* (?<args>\\g<params> )? (?: (?<con> \\g<content> ) @/ \\g{com} )?
(NB: To obtain other levels, put it inside a lookahead) (注意:要获得其他级别,请先行考虑)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.