简体   繁体   English

PHP-我的Regex需要一些帮助

[英]PHP - I need some help with my Regex

I've created a simple template 'engine' in PHP to substitute PHP-generated data into the HTML page. 我已经在PHP中创建了一个简单的模板“引擎”,以将PHP生成的数据替换为HTML页面。 Here's how it works: 运作方式如下:

In my main template file, I have variables like so: 在我的主模板文件中,我有如下变量:

<title><!-- %{title}% --></title>

I then assign data into those variables for the main page load 然后,我将数据分配给这些变量以供主页加载

$assign = array (
  'title' => 'my website - '
);

I then have separate template blocks that get loaded for the content pages. 然后,我将为内容页面加载单独的模板块。 The above really just handles the header and the footer. 上面的内容实际上只是处理页眉和页脚。 In one of these 'content template files', I have variables like so: 在这些“内容模板文件”之一中,我具有如下变量:

<!-- %{title=content page}% -->

Once this gets executed, the main template data is edited to include the content page variables resulting in: 执行此操作后,将编辑主模板数据以包含内容页面变量,从而导致:

<title>my website - content page</title>

It does this with the following code: 它使用以下代码执行此操作:

if (preg_match('/<!-- %{title=\s*(.*?)}% -->/s', $string, $matches)) {
   // Find variable names in the form of %{varName=new data to append}%
   // If found, append that new data to the exisiting data
   $string       = preg_replace('/<!-- %{title=\s*(.*?)}% -->/s', null, $string);
   $varData[$i] .= $matches[1];
}

This basically removes the template variables and then assigns the variable data to the existing variable. 这基本上删除了模板变量,然后将变量数据分配给现有变量。 Now, this all works fine. 现在,一切正常。 What I'm having issues with is nesting template variables. 我遇到的问题是嵌套模板变量。 If I do something like: 如果我做类似的事情:

<!-- %{title=content page (author: <!-- %{name}% -->) -->

The pattern, at times, messes up the opening and closing tags of each variable. 该模式有时会弄乱每个变量的开始和结束标签。

How can I fix my regular expression to prevent this? 如何解决我的正则表达式以防止这种情况?

Thank you. 谢谢。

The answer is you don't do this with regex. 答案是您不使用正则表达式执行此操作。 Regular expressions are a regular language. 正则表达式是一种正则语言。 When you start nesting things it is no longer a regular language. 当您开始嵌套事物时,它不再是常规语言。 It is, at a minimum, a context-free language ("CFL"). 至少是无上下文语言(“ CFL”)。 CFLs can only be processed (assuming they're unambiguous) with a stack. CFL仅可以使用堆栈进行处理(假设它们是明确的)。

Specifically, regular languages can be represented with a finite state machine ("FSM"). 具体而言,可以使用有限状态机(“ FSM”)来表示常规语言。 CFLs require a pushdown automaton ("PDA"). CFL需要下推式自动机(“ PDA”)。

An example of the difference is nested tags in HTML: 差异的一个示例是HTML中的嵌套标签:

<div>
  <div>inner</div>
</div>

My advice is don't write your own template language. 我的建议是不要编写自己的模板语言。 That's been done. 已经做完了。 Many times. 多次。 Use Smarty or something in Zend, Kohana or whatever. 使用Smarty或Zend,Kohana等中的任何东西。 If you do write your own, do it properly. 如果您自己编写,请正确执行。 Parse it. 解析它。

Why are you rolling your own template engine? 为什么要滚动自己的模板引擎? If you want this kind of complexity, there's a lot of places that have already come up with solutions for it. 如果您需要这种复杂性,那么已经有很多地方提出了解决方案。 You should just plug in Smarty or something like that. 您应该只插入Smarty或类似的东西。

If you're asking what I think you're asking, it's literally impossible. 如果您要问的是我想问的问题,那实际上是不可能的。 If I read your question correctly, you want to match arbitrarily-nested <!-- ... --> sequences with particular things inside. 如果我正确阅读了您的问题,则希望将任意嵌套的<!-- ... --> -...- <!-- ... -->序列与内部的特定内容进行匹配。 Unfortunately, regular expressions can only match certain classes of strings; 不幸的是,正则表达式只能匹配某些类别的字符串。 any regular expression can match only a regular language . 任何正则表达式都只能匹配正规语言 One well-known language which is not regular is the language of balanced parentheses (also known as the the Dyck language) , which is exactly what you're trying to match. 一种常见的众所周知的语言是平衡括号内语言(也称为Dyck语言) ,这正是您要匹配的语言。 In order to match arbitrarily-nested comment strings, you need a more powerful tool. 为了匹配任意嵌套的注释字符串,您需要一个功能更强大的工具。 I'm fairly sure there are pre-existing PHP template engines; 我相当确定有预先存在的PHP模板引擎; you might look into one of those. 您可能会研究其中之一。

To resolve your problem you should 要解决您的问题,您应该

  • replace preg_match() with preg_match_all() ; preg_match()替换preg_match_all() ;
  • find the pattern, and replace them from the last one to the first one; 找到模式,并从最后一个替换为第一个;
  • use a more restrictive pattern like '/<!-- %{title=\\s*([^}]*?)}% -->/s' . 使用更严格的模式,例如'/<!-- %{title=\\s*([^}]*?)}% -->/s'

I've done something similar in the past, and I have encountered the same nesting issue you did. 我过去做过类似的事情,并且遇到了与您相同的嵌套问题。 In your case, what I would do is repeatedly search your text for matches (rather than searching once and looping through the matches) and extract the strings you want by searching for anything that doesn't include your closing string. 在您的情况下,我会重复搜索文本以查找匹配项(而不是搜索一次并遍历匹配项),并通过搜索不包含结束字符串的任何内容来提取所需的字符串。

In your case, it would probably look like this: 在您的情况下,它可能看起来像这样:

/(<!--([^(-->)]*?)-->)/

Regexes like this are a nightmare to explain, but basically, ([^(-->)]*) will find any string that doesn't include your closing tag (let's call that AAA ). 像这样的正则表达式是一个噩梦般的解释,但是([^(-->)]*)基本上会找到任何不包含结束标记的字符串(我们称之为AAA )。 It will be inside a matching group that is, itself, your template tag, (<!--AAA-->) . 它将位于一个匹配组内部,该组本身就是您的模板标签(<!--AAA-->)

I'm convinced this sort of templating method is the wrong way to do things, but I've never known enough to do it better. 我坚信这种模板化方法是做事的错误方法,但是我从来不知道做得更好。 It's always bothered me in ASP and ColdFusion that you had to nest your scripting tags inside HTML and when I started to do it myself, I considered it a personal failure. 在ASP和ColdFusion中,总是让我感到困扰的是,您不得不将脚本标记嵌套在HTML中,当我自己开始这样做时,我认为这是个人的失败。

Most Regexes I do now are in JavaScript and so I may be missing some of the awesome nuances PHP has via Perl. 我现在使用的大多数正则表达式都使用JavaScript,因此我可能会缺少PHP通过Perl提供的一些很棒的细微差别。 I'd be happy if someone can write this more cleanly. 如果有人能写得更清楚我会很高兴。

I too have ran into this problem in the past, although I didn't use regular expressions. 尽管我没有使用正则表达式,但过去我也遇到过这个问题。

If instead you search from right to left for the opening tag, <!-- %{ in your syntax, using strrpos (PHP5+), then search forwards for the first occurrence of the next closing tag, and then replace that chunk first, you will end up replacing the inner-most nested variables first. 相反,如果您使用strrpos (PHP5 +)从右到左搜索语法中的开始标记<!-- %{ ,然后向前搜索下一个关闭标记的第一个匹配项,然后首先替换该块,最终将首先替换最里面的嵌套变量。 This should resolve your problem. 这应该可以解决您的问题。

You can also do it the other way around and find the first occurrence of a closing tag, and work backwards to find its corresponding opening tag. 您也可以用另一种方法进行操作,找到第一次出现的结束标记,然后向后工作以找到其对应的开始标记。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM