简体   繁体   中英

Regex for multi-line HTML comments (preg_match_all)

I have an html document with multiple commented-out PHP arrays, eg:

<!-- Array
(
[key] => 0
)
-->

Using PHP, I need to somehow parse the HTML for only these comments (there are other comments that will need to be ignored) and extract the contents. I've been trying to use preg_match_all but my regex skills aren't up to much. Could anyone point me in the right direction?

Any help is much appreciated!

How about using a HTML Parser that allows you to access comments (For example Simple HTML DOM ) and then check each comment for new lines using strpos .

$html = str_get_html('...HTML HERE...');
$comments = $html->find('comment');
foreach ( $comments as $comment ){
    if ( strpos($comment, "\n") !== false ){
        //process comment
    }
}

Three facts come into play here

  1. there is no place in a HTML document where a literal " <!-- " can show up and not mean a comment (everywhere else it would be escaped as " &amp;!-- ")
  2. you don't seem to want to change the document contents, only find bits in it (search-and-replace has a high probability of breaking the document, search alone has not)
  3. comments cannot be nested in HTML (contrary to normal HTML tags) - this makes all the difference

The above combination means that (lo and behold) regular expressions can be used to identify HTML comments.

Try this regex: <!-- Array([\\s\\S])*?--> . Match group one will contain everything after "Array" up to the closing sequence of the comment.

You can apply further sanity checking to the found bits to make sure they are in fact what you are looking for.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM