简体   繁体   English

解析表,使用DOMXpath不能获得超过3行

[英]Parsing A Table, Can't get more than 3 row Using DOMXpath

For some wierd reason that I can't understand right now I can't fetch more than 3 row from an table in a page 由于某些目前无法理解的奇怪原因,我无法从页面中的表中获取超过3行

This is the page. 这是页面。

http://www.reedmfgco.com/en/products/cutters-and-cutter-wheels/cutter-wheels/cutter-wheels-for-tubing-cutters-plastic/ http://www.reedmfgco.com/en/products/cutters-and-cutter-wheels/cutter-wheels/cutter-wheels-for-tubing-cutters-plastic/

I want to parse the table at the bottom. 我想解析底部的表格。

Since there is only one table in the page I made my Xpath really simple. 由于页面中只有一个表,因此我使Xpath非常简单。 $xpath -> query('//tr')

If I do the following 如果我执行以下操作

echo $xpath -> query('//tr')->lenght;

I get 3 我得到3

Why Am i getting 3 there is 9 row there, I should get 9 . 为什么我得到3那里有9行,我应该得到9


Edit This is the code I Use 编辑这是我使用的代码

$Dom = new DOMDocument();
@$Dom -> loadHTML($this->html);
$xpath = new DOMXPath($Dom);
echo $xpath -> query('//tr')->lenght;

And please note that $this->html is the raw html from the previous link in my post. 并且请注意,$ this-> html是我文章中上一个链接的原始html。

HTML source on this page is not valid for XML. 此页面上的HTML源代码不适用于XML。 If you open the source code of the page and will look for a tag <tr> , it also has 3 elements. 如果您打开页面的源代码并寻找标签<tr> ,则它也包含3个元素。 Table row products do not have opening tag <tr> 表格行产品没有开头标签<tr>

For this problem, you can use regular expressions to normalize the contents of the table. 对于此问题,可以使用正则表达式来规范化表的内容。

$html = file_get_contents('http://www.reedmfgco.com/en/products/cutters-and-cutter-wheels/cutter-wheels/cutter-wheels-for-tubing-cutters-plastic/');

preg_match('`<tbody>(.*)<\/tbody>`', $html, $matches);
if (!empty($matches)) {
    $tableBody = str_replace('</tr><td', '</tr><tr><td', $matches[1]);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM