[英]PHP DOMDocument parse HTML
I have the following HTML markup我有以下 HTML 标记
<div contenteditable="true" class="text"></div>
<div contenteditable="true" class="text"></div>
<div style="display: block;" class="ui-draggable">
<img class='avatar' src=""/>
<p style="">
<img class='pic' src=""/><br>
<span class='fulltext' style="display:none"></span>
</p>-<span class='create'></span>
<a class='permalink' href=""></a>
</div>
<div contenteditable="true" class="text"></div>
<div style="display: block;" class="ui-draggable">
<img class='avatar' src=""/>
<p style="">
<img class='pic' src=""/><br>
<span class='fulltext' style="display:none"></span>
</p><span class='create'></span><a class='permalink' href=""></a>
</div>
The parent div's can be more.In order to parse the information and to insert it in the DB I'm using the following code -父 div 可以更多。为了解析信息并将其插入到数据库中,我使用以下代码 -
$dom = new DOMDocument();
$dom->loadHTML($xml);
$xpath = new DOMXPath($dom);
$div = $xpath->query('//div');
$i=0;
$q=1;
foreach($div as $book) {
$attr = $book->getAttribute('class');
//if div contenteditable
if($attr == 'text') {
echo '</br>'.$book->nodeValue."</br>";
}
else {
$new = new DOMDocument();
$newxpath = new DOMXPath($new);
$avatar = $xpath->query("(//img[@class='avatar']/@src)[$q]");
$picture = $xpath->query("(//p/img[@class='pic']/@src)[$q]");
$fulltext = $xpath->query("(//p/span[@class='fulltext'])[$q]");
$permalink = $xpath->query("(//a[@class='permalink'])[$q]");
echo $permalink->item(0)->nodeValue; //date
echo $permalink->item(0)->getAttribute('href');
echo $fulltext->item(0)->nodeValue;
echo $avatar->item(0)->value;
echo $picture->item(0)->value;
$q++;
}
$i++;
}
But I think that there's a better way for parsing the HTML.但我认为解析 HTML 有更好的方法。 Is there?
有没有? Thank you in advance
先感谢您
Note that DOMXPath::query
supports a second param called contextparam
.请注意,
DOMXPath::query
支持名为contextparam
的第二个参数。 Also you won't need a second DOMDocument and DOMXPath inside the loop.此外,您将不需要循环内的第二个 DOMDocument 和 DOMXPath。 Use:
采用:
$avatar = $xpath->query("img[@class='avatar']/@src", $book);
to get <img src="">
attribute nodes relative to the div nodes.获取相对于 div 节点的
<img src="">
属性节点。 If you follow my advices your example should be fine.如果你遵循我的建议,你的例子应该没问题。
Here comes a version of your code that follows the above said:这是您的代码的一个版本,它遵循上述内容:
$dom = new DOMDocument();
$dom->loadHTML($xml);
$xpath = new DOMXPath($dom);
$divs = $xpath->query('//div');
foreach($divs as $book) {
$attr = $book->getAttribute('class');
if($attr == 'text') {
echo '</br>'.$book->nodeValue."</br>";
} else {
$avatar = $xpath->query("img[@class='avatar']/@src", $book);
$picture = $xpath->query("p/img[@class='pic']/@src", $book);
$fulltext = $xpath->query("p/span[@class='fulltext']", $book);
$permalink = $xpath->query("a[@class='permalink']", $book);
echo $permalink->item(0)->nodeValue; //date
echo $permalink->item(0)->getAttribute('href');
echo $fulltext->item(0)->nodeValue;
echo $avatar->item(0)->value;
echo $picture->item(0)->value;
}
}
As a matter of fact, you do it the right way : html has to be parsed with a DOM object.事实上,您的做法是正确的:必须使用 DOM 对象解析 html。 Then some optimisation can be brough :
然后可以进行一些优化:
$div = $xpath->query('//div');
is quite greedy, a getElementsByTagName should be more appropriate :非常贪婪, getElementsByTagName 应该更合适:
$div = $dom->getElementsByTagName('div');
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.