[英]Rogue element when parsing HTML with DOMDocument
假设我的$ html看起来像这样:
<!DOCTYPE html>
<html>
<head>
<script type="text/javascript">document.createElement("video");document.createElement("audio");document.createElement("track");</script>
<script type="text/javascript" src="/gui/default/tinymcecontent.js"></script>
<script type="text/javascript" src="/includes/js/video-js/video.min.js"></script>
<link rel="stylesheet" href="/includes/js/video-js/video-js.css" />
<script type="text/javascript">document.createElement("video");document.createElement("audio");document.createElement("track");</script>
<script type"text/javascript" src="/includes/js/video-js/video.js"></script/>
<link rel="stylesheet" href="/includes/js/video-js/video-js.css" />
</head>
<body style="font-family: arial;font-size: 12px;">
<p> </p>
<table width="100%">
</table>
</body>
</html>
当我尝试仅解析带有命令的body标记内的元素时:
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
libxml_use_internal_errors(false);
$full_dom = $dom->getElementsByTagName('body')->item(0);
的结果
$dom->saveHTML($full_dom)
是
<body>\n<p>\/><link rel=\"stylesheet\" href=\"\/includes\/js\/video-js\/video-js.css\"><\/p>\n<p>\u00a0<\/p>\n<table width=\"100%\"><\/table>\n<\/body>
元件
<p>\/><link rel=\"stylesheet\" href=\"\/includes\/js\/video-js\/video-js.css\"><\/p>
来自哪里? 其他一切都很好,只是此元素从head标签转移到body标签元素。
它来自以下行:
<script type"text/javascript" src="/includes/js/video-js/video.js"></script/>
格式不正确,应为:
<script type="text/javascript" src="/includes/js/video-js/video.js"></script>
您必须在$dom->loadHTML()
之后检查错误,以查看发生了什么:
foreach (libxml_get_errors() as $error) {
print_r($error);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.