[英]Parsing A Table, Can't get more than 3 row Using DOMXpath
For some wierd reason that I can't understand right now I can't fetch more than 3 row from an table in a page 由于某些目前无法理解的奇怪原因,我无法从页面中的表中获取超过3行
This is the page. 这是页面。
http://www.reedmfgco.com/en/products/cutters-and-cutter-wheels/cutter-wheels/cutter-wheels-for-tubing-cutters-plastic/ http://www.reedmfgco.com/en/products/cutters-and-cutter-wheels/cutter-wheels/cutter-wheels-for-tubing-cutters-plastic/
I want to parse the table at the bottom. 我想解析底部的表格。
Since there is only one table in the page I made my Xpath really simple. 由于页面中只有一个表,因此我使Xpath非常简单。
$xpath -> query('//tr')
If I do the following 如果我执行以下操作
echo $xpath -> query('//tr')->lenght;
I get 3
我得到
3
Why Am i getting 3
there is 9 row there, I should get 9
. 为什么我得到
3
那里有9行,我应该得到9
。
Edit This is the code I Use 编辑这是我使用的代码
$Dom = new DOMDocument();
@$Dom -> loadHTML($this->html);
$xpath = new DOMXPath($Dom);
echo $xpath -> query('//tr')->lenght;
And please note that $this->html is the raw html from the previous link in my post. 并且请注意,$ this-> html是我文章中上一个链接的原始html。
HTML source on this page is not valid for XML. 此页面上的HTML源代码不适用于XML。 If you open the source code of the page and will look for a tag
<tr>
, it also has 3 elements. 如果您打开页面的源代码并寻找标签
<tr>
,则它也包含3个元素。 Table row products do not have opening tag <tr>
表格行产品没有开头标签
<tr>
For this problem, you can use regular expressions to normalize the contents of the table. 对于此问题,可以使用正则表达式来规范化表的内容。
$html = file_get_contents('http://www.reedmfgco.com/en/products/cutters-and-cutter-wheels/cutter-wheels/cutter-wheels-for-tubing-cutters-plastic/');
preg_match('`<tbody>(.*)<\/tbody>`', $html, $matches);
if (!empty($matches)) {
$tableBody = str_replace('</tr><td', '</tr><tr><td', $matches[1]);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.