在PHP中使用Xpath从网页提取数据

Question

我需要从这里提取数据。 该网页包含评论评论，评论标题，被认为有用的评论数量和评分（以星号表示），我需要提取这些评论。

现在我面临的问题是我只能检索评论注释，而该问题也首先出现在页面中（它不会移至下一个评论注释）。
我无法检索评论标题，因为它带有html中的不同对象ID。

例如：（在这种情况下，我可以使用正则表达式作为对象ID吗？）

<a href="/review/www.currys.co.uk/5370859f00006400028963d9">Customer services what a load of cp</a>

我也不知道如何获得有用的评论数量，并按图标所示将其评级为1到5。

我的代码：

$url = "https://www.trustpilot.co.uk/review/www.currys.co.uk";
$html = file_get_contents( $url);
libxml_use_internal_errors( true);
$doc = new DOMDocument; $doc->loadHTML( $html);
$xpath = new DOMXpath( $doc);
$node = $xpath->query( '//div[@itemprop="reviewBody"][@class="review-body"]')->item( 0);
echo $node  >textContent;

Answer 1

现在只显示第一个的原因是因为您仅选择了->item( 0) ，因此需要循环浏览它们。 另外，要打印标签内的元素，请使用nodeValue （您尝试过不存在的textContent ）。

以下代码在表中打印出10条评论，包括评分（星号），标题和内容：

$url = "https://www.trustpilot.co.uk/review/www.currys.co.uk";
$html = file_get_contents( $url);
libxml_use_internal_errors( true);
$doc = new DOMDocument; $doc->loadHTML( $html);
$xpath = new DOMXpath( $doc);
//get all ratings where <meta itemprop="ratingValue">
$ratings = $xpath->query('//meta[@itemprop="ratingValue"]');
//get all headings where <h3 class="review-title en h4">
$headings = $xpath->query( '//h3[@class="review-title en h4"]');
//get all content
$node = $xpath->query( '//div[@itemprop="reviewBody"][@class="review-body"]');

$table = "<table border=1>";
for($i=0;$i<10;$i++){
$table .= '<tr>
           <td>Star: '.str_repeat("*",$ratings->item($i)->getAttribute('content')).'</tr>
           <td>'.$headings->item($i)->nodeValue.'</tr>
           <td>'.$node->item($i)->nodeValue.'</tr>
           </tr>';
}
$table .= '</table>';
echo $table;

在PHP中使用Xpath从网页提取数据

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-05-15 10:43:19

在PHP中使用Xpath从网页提取数据

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-05-15 10:43:19

解决方案1
1 已采纳 2014-05-15 10:43:19