解析表，使用DOMXpath不能获得超过3行

Question

For some wierd reason that I can't understand right now I can't fetch more than 3 row from an table in a page 由于某些目前无法理解的奇怪原因，我无法从页面中的表中获取超过3行

This is the page. 这是页面。

http://www.reedmfgco.com/en/products/cutters-and-cutter-wheels/cutter-wheels/cutter-wheels-for-tubing-cutters-plastic/ http://www.reedmfgco.com/en/products/cutters-and-cutter-wheels/cutter-wheels/cutter-wheels-for-tubing-cutters-plastic/

I want to parse the table at the bottom. 我想解析底部的表格。

Since there is only one table in the page I made my Xpath really simple. 由于页面中只有一个表，因此我使Xpath非常简单。 $xpath -> query('//tr')

If I do the following 如果我执行以下操作

echo $xpath -> query('//tr')->lenght;

I get 3 我得到3

Why Am i getting 3 there is 9 row there, I should get 9 . 为什么我得到3那里有9行，我应该得到9 。

Edit This is the code I Use 编辑这是我使用的代码

$Dom = new DOMDocument();
@$Dom -> loadHTML($this->html);
$xpath = new DOMXPath($Dom);
echo $xpath -> query('//tr')->lenght;

And please note that $this->html is the raw html from the previous link in my post. 并且请注意，$ this-> html是我文章中上一个链接的原始html。

Answer 1

HTML source on this page is not valid for XML. 此页面上的HTML源代码不适用于XML。 If you open the source code of the page and will look for a tag <tr> , it also has 3 elements. 如果您打开页面的源代码并寻找标签<tr> ，则它也包含3个元素。 Table row products do not have opening tag <tr> 表格行产品没有开头标签<tr>

For this problem, you can use regular expressions to normalize the contents of the table. 对于此问题，可以使用正则表达式来规范化表的内容。

$html = file_get_contents('http://www.reedmfgco.com/en/products/cutters-and-cutter-wheels/cutter-wheels/cutter-wheels-for-tubing-cutters-plastic/');

preg_match('`<tbody>(.*)<\/tbody>`', $html, $matches);
if (!empty($matches)) {
    $tableBody = str_replace('</tr><td', '</tr><tr><td', $matches[1]);
}

解析表，使用DOMXpath不能获得超过3行

问题描述

1 个解决方案

解决方案1
0 已采纳 2014-03-19 21:05:37

解析表，使用DOMXpath不能获得超过3行

问题描述

1 个解决方案

解决方案1 0 已采纳 2014-03-19 21:05:37

解决方案1
0 已采纳 2014-03-19 21:05:37