简体   繁体   English

用于多行HTML注释的正则表达式(preg_match_all)

[英]Regex for multi-line HTML comments (preg_match_all)

I have an html document with multiple commented-out PHP arrays, eg: 我有一个带有多个注释掉的PHP数组的html文档,例如:

<!-- Array
(
[key] => 0
)
-->

Using PHP, I need to somehow parse the HTML for only these comments (there are other comments that will need to be ignored) and extract the contents. 使用PHP,我需要以某种方式仅解析这些注释的HTML(还有其他需要忽略的注释)并提取内容。 I've been trying to use preg_match_all but my regex skills aren't up to much. 我一直在尝试使用preg_match_all但是我的正则表达式技能并不高。 Could anyone point me in the right direction? 有人能指出我正确的方向吗?

Any help is much appreciated! 任何帮助深表感谢!

How about using a HTML Parser that allows you to access comments (For example Simple HTML DOM ) and then check each comment for new lines using strpos . 如何使用允许您访问注释(例如, 简单HTML DOM )然后使用strpos检查每个注释中是否有新行的HTML解析器。

$html = str_get_html('...HTML HERE...');
$comments = $html->find('comment');
foreach ( $comments as $comment ){
    if ( strpos($comment, "\n") !== false ){
        //process comment
    }
}

Three facts come into play here 这里有三个事实

  1. there is no place in a HTML document where a literal " <!-- " can show up and not mean a comment (everywhere else it would be escaped as " &amp;!-- ") 在HTML文档中,没有任何地方可以显示文字“ <!-- ”而并不表示注释(在其他任何地方都将其转义为“ &amp;!-- ”)
  2. you don't seem to want to change the document contents, only find bits in it (search-and-replace has a high probability of breaking the document, search alone has not) 您似乎不想更改文档的内容,只想查找其中的位(搜索和替换很有可能会破坏文档,仅搜索并没有)
  3. comments cannot be nested in HTML (contrary to normal HTML tags) - this makes all the difference 注释不能嵌套在HTML中(与普通的HTML标签相反)-这使所有区别

The above combination means that (lo and behold) regular expressions can be used to identify HTML comments. 上面的组合意味着(lo和behold)正则表达式用于标识HTML注释。

Try this regex: <!-- Array([\\s\\S])*?--> . 尝试此正则表达式: <!-- Array([\\s\\S])*?--> Match group one will contain everything after "Array" up to the closing sequence of the comment. 匹配组1将包含"Array"之后的所有内容,直至注释的关闭顺序。

You can apply further sanity checking to the found bits to make sure they are in fact what you are looking for. 您可以对找到的位进行进一步的完整性检查,以确保它们确实是您所要的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM