简体   繁体   English

使用xpath和php从html页面检索数据

[英]Retrieve data from html page using xpath and php

I know there are similar question, but, trying to study PHP I met this error and I want understand why this occurs. 我知道也有类似的问题,但是,尝试研究PHP时遇到了此错误,我想了解为什么会发生这种情况。

<?php
    $url = 'http://aice.anie.it/quotazione-lme-rame/';
    echo "hello!\r\n";
    $html = new DOMDocument();
    @$html->loadHTML($url);
    $xpath = new DOMXPath($html);
    $nodelist = $xpath->query(".//*[@id='table33']/tbody/tr[2]/td[3]/b");

    foreach ($nodelist as $n) {
        echo $n->nodeValue . "\n";
    }
?>

this prints just "hello!". 这只会打印“你好!”。 I want to print the value extracted with the xpath, but the last echo doesn't do anything. 我想打印用xpath提取的值,但是最后一个回声什么也没做。

You have some errors in your code : 您的代码中有一些错误:

  1. You try to get the table from the url http://aice.anie.it/quotazione-lme-rame/ , but it's actually in an iframe located at http://www.aiceweb.it/it/frame_rame.asp , so get the iframe url directly. 您尝试从url http://aice.anie.it/quotazione-lme-rame/获取表,但实际上它位于位于http://www.aiceweb.it/it/frame_rame.asp的iframe中,因此直接获取iframe网址。

  2. You use the function loadHTML(), which load an HTML string . 您可以使用loadHTML()函数来加载HTML字符串 What you need is the loadHTMLFile function, which takes the link of an HTML document as a parameter (See http://www.php.net/manual/fr/domdocument.loadhtmlfile.php ) 您需要的是loadHTMLFile函数,该函数将HTML文档的链接作为参数(请参阅http://www.php.net/manual/fr/domdocument.loadhtmlfile.php

  3. You assume there is a tbody element on the page but there is no one. 您假定页面上有一个tbody元素,但没有一个。 So remove that from your query filter. 因此,将其从查询过滤器中删除。

Working code : 工作代码:

$url = 'http://www.aiceweb.it/it/frame_rame.asp';
echo "hello!\r\n";
$html = new DOMDocument();
@$html->loadHTMLFile($url);
$xpath = new DOMXPath($html);
$nodelist = $xpath->query(".//*[@id='table33']/tr[2]/td[3]/b");

foreach ($nodelist as $n) {
    echo $n->nodeValue . "\n";
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM