Regex for multi-line HTML comments (preg_match_all)

Question

I have an html document with multiple commented-out PHP arrays, eg:

<!-- Array
(
[key] => 0
)
-->

Using PHP, I need to somehow parse the HTML for only these comments (there are other comments that will need to be ignored) and extract the contents. I've been trying to use preg_match_all but my regex skills aren't up to much. Could anyone point me in the right direction?

Any help is much appreciated!

Answer 1

How about using a HTML Parser that allows you to access comments (For example Simple HTML DOM ) and then check each comment for new lines using strpos .

$html = str_get_html('...HTML HERE...');
$comments = $html->find('comment');
foreach ( $comments as $comment ){
    if ( strpos($comment, "\n") !== false ){
        //process comment
    }
}

Answer 2

Three facts come into play here

there is no place in a HTML document where a literal " <!-- " can show up and not mean a comment (everywhere else it would be escaped as " &!-- ")
you don't seem to want to change the document contents, only find bits in it (search-and-replace has a high probability of breaking the document, search alone has not)
comments cannot be nested in HTML (contrary to normal HTML tags) - this makes all the difference

The above combination means that (lo and behold) regular expressions can be used to identify HTML comments.

Try this regex:  . Match group one will contain everything after "Array" up to the closing sequence of the comment.

You can apply further sanity checking to the found bits to make sure they are in fact what you are looking for.

Answer 3

Don't parse HTML with regular expressions. Ever.

Regex for multi-line HTML comments (preg_match_all)

Question

3 answers

solution1
2 2010-04-06 12:22:35

solution2
2 ACCPTED 2010-04-06 13:25:09

solution3
-2 2010-04-06 12:23:07

Regex for multi-line HTML comments (preg_match_all)

Question

3 answers

solution1 2 2010-04-06 12:22:35

solution2 2 ACCPTED 2010-04-06 13:25:09

solution3 -2 2010-04-06 12:23:07

solution1
2 2010-04-06 12:22:35

solution2
2 ACCPTED 2010-04-06 13:25:09

solution3
-2 2010-04-06 12:23:07