[英]Retrieving categories from a website with DOMDocument and DOMXPath classes
我有一個PHP代碼,可以使用類名“ sub-title”從此網站檢索類別。 但是,輸出不顯示任何內容。 我究竟做錯了什么?
PHP代碼:
<?php
header('Content-Type: text/html; charset=utf-8');
$grep = new DoMDocument();
@$grep>loadHTMLFile("http://www.alibaba.com/Products",false,stream_context_create(array("http" => array("user_agent" => "any"))));
$finder = new DomXPath($grep);
$class = "sub-title";
$nodes = $finder->query("//*[contains(@class, '$class')]");
foreach ($nodes as $node) {
$span = $node->childNodes;
echo $span->item(0)->nodeValue;
}
?>
所需的輸出:
農業
食品與飲料
服飾
等等..
謝謝!
僅針對該特定元素。 順便說一下,您當前的代碼在$grep>loadHTMLFile
上有一個錯字。 它遺漏-
在->
。 我做了一點修改。
$ch = curl_init('http://www.alibaba.com/Products');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
$html = curl_exec($ch);
$dom = new DOMDocument();
@$dom->loadHTML($html);
$finder = new DOMXPath($dom);
$nodes = $finder->query('//h4[@class="sub-title"]');
foreach ($nodes as $node) {
$sub_title = trim(explode("\n", trim($node->nodeValue))[0]) . '<br/>';
echo $sub_title;
}
要在使用DOMDocument::loadHTMLFile
來獲取HTML時設置流上下文,請使用libxml_set_streams_context
:
<?php
$context = stream_context_create(array('http' => array('user_agent' => 'any')));
libxml_set_streams_context($context);
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile('http://www.alibaba.com/Products');
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//h4[@class="sub-title"]/a');
foreach ($nodes as $node) {
echo trim($node->textContent) . "\n";
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.