使用DomXPath刮取

Question

使用PHP DomXPath抓取某些网站。

当前正在使用本教程遍历XPath。

我目前正在抓取该网站，获取角色名称和Steam ID（下面的XPath混乱是得到一个Steam ID的原因）。

我的问题是-有多个Steam ID和角色名称。 我辛苦创建的XPath只有一个。

我应该如何抓取所有 Steam ID而不是其中一个？

$xpath = new DomXPath($this->ourTeamHTML);

/* Set HTTP response header to plain text for debugging output */
header("Content-type: text/plain");

$steamName = $xpath->query('//*[@id="wrapper"]/section/div/div[1]/div[2]/div[2]/div[1]/div/div/div[1]/div/div[1]/h5/b');
/* Traverse the DOMNodeList object to output each DomNode's nodeValue */
foreach ($steamName as $node) {
    echo "Steam Name: " . $node->nodeValue . "\n";
}

Answer 1

您的xpath太冗长，具有完整的路径和元素索引，阅读起来不直观，并且由于页面源的细微变化而趋于中断。 尝试使用以下更简单的xpath：

//*[@id="wrapper"]//div[@class='col-md-12']//h5/b

它对我有用，可以从链接页面获取所有Steam ID和字符名称（总共32个元素）（使用firefox的firepath附加组件进行了测试）

使用DomXPath刮取

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-06-13 06:00:07

使用DomXPath刮取

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-06-13 06:00:07

解决方案1
0 已采纳 2015-06-13 06:00:07