简体   繁体   中英

PHP - I need some help with my Regex

I've created a simple template 'engine' in PHP to substitute PHP-generated data into the HTML page. Here's how it works:

In my main template file, I have variables like so:

<title><!-- %{title}% --></title>

I then assign data into those variables for the main page load

$assign = array (
  'title' => 'my website - '
);

I then have separate template blocks that get loaded for the content pages. The above really just handles the header and the footer. In one of these 'content template files', I have variables like so:

<!-- %{title=content page}% -->

Once this gets executed, the main template data is edited to include the content page variables resulting in:

<title>my website - content page</title>

It does this with the following code:

if (preg_match('/<!-- %{title=\s*(.*?)}% -->/s', $string, $matches)) {
   // Find variable names in the form of %{varName=new data to append}%
   // If found, append that new data to the exisiting data
   $string       = preg_replace('/<!-- %{title=\s*(.*?)}% -->/s', null, $string);
   $varData[$i] .= $matches[1];
}

This basically removes the template variables and then assigns the variable data to the existing variable. Now, this all works fine. What I'm having issues with is nesting template variables. If I do something like:

<!-- %{title=content page (author: <!-- %{name}% -->) -->

The pattern, at times, messes up the opening and closing tags of each variable.

How can I fix my regular expression to prevent this?

Thank you.

The answer is you don't do this with regex. Regular expressions are a regular language. When you start nesting things it is no longer a regular language. It is, at a minimum, a context-free language ("CFL"). CFLs can only be processed (assuming they're unambiguous) with a stack.

Specifically, regular languages can be represented with a finite state machine ("FSM"). CFLs require a pushdown automaton ("PDA").

An example of the difference is nested tags in HTML:

<div>
  <div>inner</div>
</div>

My advice is don't write your own template language. That's been done. Many times. Use Smarty or something in Zend, Kohana or whatever. If you do write your own, do it properly. Parse it.

Why are you rolling your own template engine? If you want this kind of complexity, there's a lot of places that have already come up with solutions for it. You should just plug in Smarty or something like that.

If you're asking what I think you're asking, it's literally impossible. If I read your question correctly, you want to match arbitrarily-nested <!-- ... --> sequences with particular things inside. Unfortunately, regular expressions can only match certain classes of strings; any regular expression can match only a regular language . One well-known language which is not regular is the language of balanced parentheses (also known as the the Dyck language) , which is exactly what you're trying to match. In order to match arbitrarily-nested comment strings, you need a more powerful tool. I'm fairly sure there are pre-existing PHP template engines; you might look into one of those.

To resolve your problem you should

  • replace preg_match() with preg_match_all() ;
  • find the pattern, and replace them from the last one to the first one;
  • use a more restrictive pattern like '/<!-- %{title=\\s*([^}]*?)}% -->/s' .

I've done something similar in the past, and I have encountered the same nesting issue you did. In your case, what I would do is repeatedly search your text for matches (rather than searching once and looping through the matches) and extract the strings you want by searching for anything that doesn't include your closing string.

In your case, it would probably look like this:

/(<!--([^(-->)]*?)-->)/

Regexes like this are a nightmare to explain, but basically, ([^(-->)]*) will find any string that doesn't include your closing tag (let's call that AAA ). It will be inside a matching group that is, itself, your template tag, (<!--AAA-->) .

I'm convinced this sort of templating method is the wrong way to do things, but I've never known enough to do it better. It's always bothered me in ASP and ColdFusion that you had to nest your scripting tags inside HTML and when I started to do it myself, I considered it a personal failure.

Most Regexes I do now are in JavaScript and so I may be missing some of the awesome nuances PHP has via Perl. I'd be happy if someone can write this more cleanly.

I too have ran into this problem in the past, although I didn't use regular expressions.

If instead you search from right to left for the opening tag, <!-- %{ in your syntax, using strrpos (PHP5+), then search forwards for the first occurrence of the next closing tag, and then replace that chunk first, you will end up replacing the inner-most nested variables first. This should resolve your problem.

You can also do it the other way around and find the first occurrence of a closing tag, and work backwards to find its corresponding opening tag.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM