简体   繁体   English

PHP正则表达式递归匹配

[英]PHP regex matching recursively

I'm trying to match a certain set of tags in a template file. 我正在尝试匹配模板文件中的一组特定标签。 I however want the tags to be able to be nested in itself. 但是,我希望标签能够嵌套在自身中。

My regex is the following: (with /s) 我的正则表达式如下:(带有/ s)

<!-- START (.*?) -->(.*?)<!-- END \\1 -->

Tag example: 标签示例:

<!-- START yList -->
  y:{yList:NUM} | 
  <!-- START xList -->
    x:{xList:NUM} 
  <!-- END xList -->
  <!-- CARET xList -->
  <br>
<!-- END yList -->
<!-- CARET yList -->

Right now the matches result will be: 现在,匹配结果将是:

match 0: 符合0:

group(0) (Whole match) 小组(0)(全场)

<!-- START yList --> 
 y 
 <!-- START xList --> 
   x 
 <!-- END xList --> 
 <!-- CARET xList --> 
 <br> 
<!-- END yList -->

group(1) 基团(1)

yList

group(2) 组(2)

y 
<!-- START xList --> 
  x 
<!-- END xList --> 
<!-- CARET xList --> 
<br>

I want 2 matches instead of 1 obviously, the nested tag set isn't matched. 我希望2个匹配项而不是1个匹配项,嵌套标记集不匹配。 Is this possible with regex, or should I just keep regexing group(2) results, untill i've found no new matches? 使用正则表达式可能吗,还是我应该只保留正则表达式group(2)的结果,直到找不到新的匹配项?

Regular expressions are not suited for parsing arbitrary-depth tree structures. 正则表达式不适合解析任意深度的树结构。 It may be possible to do, depending on the regex flavor you are using, but not recommended - they are difficult to read and difficult to debug as well. 可能可以执行此操作,具体取决于您使用的regex风格,但不建议这样做-它们难以阅读且难以调试。

I would suggest writing a simple parser instead. 我建议改写一个简单的解析器。 What you do is decompose your text into a set of possible tokens which can each be defined by simple regular expressions, eg: 您要做的是将文本分解为一组可能的标记 ,每个标记都可以由简单的正则表达式定义,例如:

START_TOKEN = "<!-- START [A-Za-z] -->"
END_TOKEN = ...
HTML_TEXT = ...

Iterate over your string, and as long as you match these tokens, pull them out of the string, and store them in a separate list. 遍历您的字符串,只要您匹配这些标记,就将它们从字符串中拉出,并将它们存储在单独的列表中。 Be sure to save the text that was inside the token (if any) when you do this. 执行此操作时,请确保保存令牌内的文本(如果有)。

Then you can iterate over your list of tokens, and based on the token types you can create a nested tree structure of nodes, each containing either 1) the text of the original token, and 2) a list of child nodes. 然后,您可以遍历令牌列表,并根据令牌类型可以创建节点的嵌套树结构,每个节点包含1)原始令牌的文本和2)子节点列表。

You may want to look at some parser tutorials if this seems too complicated. 如果这看起来太复杂,则可能需要看一些解析器教程。

You could do something like this: 您可以执行以下操作:

$parts = preg_split('/(<!-- (?:START|END|CARET) [a-zA-Z][a-zA-Z0-9]* -->)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
$tokens = array();
$isTag = isset($tokens[0]) && preg_match('/^<!-- (?:START|END|CARET) [a-zA-Z][a-zA-Z0-9]* -->$/', $tokens[0]);
foreach ($parts as $part) {
    if ($isTag) {
        preg_match('/^<!-- (START|END|CARET) ([a-zA-Z][a-zA-Z0-9]*) -->$/', $token, $match);
        $tokens[] = array($match[1], $match[2]);
    } else {
        if ($token !== '') $tokens[] = $token;
    }
    $isTag = !$isTag;
}
var_dump($tokens);

That will give you the structure of your code. 这将为您提供代码的结构。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM