使用Xpath和PHP来解析HTML

Question

我目前正在尝试从论坛解析一些数据。 这是代码：

$xml = simplexml_load_file('https://forums.eveonline.com');

$names = $xml->xpath("html/body/div/div/form/div/div/div/div/div[*]/div/div/table//tr/td[@class='topicViews']");
foreach($names as $name) 
{
    echo $name . "<br/>";
}

无论如何，问题是我正在使用谷歌xpath扩展来帮助我获得路径，我猜测谷歌正在改变html足以让我在使用我的网站进行搜索时不会出现。 是否有某种方式我可以让主机通过谷歌浏览器查看网站，以便它获得正确的代码？ 你会建议什么？

谢谢！

Answer 1

我的建议是始终使用DOMDocument而不是SimpleXML，因为它是一个更好的界面，可以使任务更加直观。

以下示例说明如何将HTML加载到DOMDocument对象中并使用XPath查询DOM。 您真正需要做的就是找到所有具有topicViews类名称的td元素，这将输出在此XPath查询返回的DOMNodeList中找到的每个nodeValue成员。

/* Use internal libxml errors -- turn on in production, off for debugging */
libxml_use_internal_errors(true);
/* Createa a new DomDocument object */
$dom = new DomDocument;
/* Load the HTML */
$dom->loadHTMLFile("https://forums.eveonline.com");
/* Create a new XPath object */
$xpath = new DomXPath($dom);
/* Query all <td> nodes containing specified class name */
$nodes = $xpath->query("//td[@class='topicViews']");
/* Set HTTP response header to plain text for debugging output */
header("Content-type: text/plain");
/* Traverse the DOMNodeList object to output each DomNode's nodeValue */
foreach ($nodes as $i => $node) {
    echo "Node($i): ", $node->nodeValue, "\n";
}

Answer 2

双'/'将进行xpath搜索。 因此，如果您使用xpath'// table'，您将获得所有表。 您还可以在xpath结构中更深入地使用它，例如'html / body / div / div / form // table'，以获取xpath'html / body / div / div / form'下的所有表。

这样，您可以使代码对html源代码的更改更具弹性。

如果你想使用它，我建议你学习一些关于xpath的知识。 复制粘贴只能让你到目前为止。

有关语法的简单说明，请访问w3schools.com/xml/xpath_syntax.asp

使用Xpath和PHP来解析HTML

问题描述

2 个解决方案

解决方案1
39 2012-12-05 08:06:51

解决方案2
3 2012-12-05 07:57:29

使用Xpath和PHP来解析HTML

问题描述

2 个解决方案

解决方案1 39 2012-12-05 08:06:51

解决方案2 3 2012-12-05 07:57:29

解决方案1
39 2012-12-05 08:06:51

解决方案2
3 2012-12-05 07:57:29