简体   繁体   English

XPath PHP解析HTML表 <td></td> 标签

[英]XPath PHP parsing HTML table <td> </td> tags

I am trying to parse html table in order to get <td> ID HERE </td> tag content using Xpath and PHP. 我试图解析html表,以便使用Xpath和PHP获得<td> ID HERE </td>标签的内容。 Executing following line $doc->loadHTMLFile($file); 执行以下行$doc->loadHTMLFile($file); gives me warnings like this: 给我这样的警告:

PHP Warning: DOMDocument::loadHTMLFile(): Unexpected end tag : tr in... PHP警告:DOMDocument :: loadHTMLFile():意外结束标记:tr in ...

That's why I am using the following block of code: 这就是为什么我使用以下代码块:

libxml_use_internal_errors(true); $doc->loadHTMLFile($file); libxml_clear_errors();

Trying to parse this: (the entire page here ) 尝试解析此内容:(整个页面在此处

 <table class="object-table" cellpadding="0" cellspacing="0"> <tbody> <tr> <th width="8%">something here</th> <th width="89%">something here</th> <th width="3%">something here</th> </tr> <tr class="normal-row"> <td>ID number here</td> <td><a href="/catalog/view/id/4127">something here</a> </td> <td align="center"> <img src="/design/img/hasnt_photo_icon.gif"> </td> </tr> <tr class="odd-row"> <td>ID number here</td> <td><a href="/catalog/view/id/1865">something here</a> </td> <td align="center"> <img src="/design/img/hasnt_photo_icon.gif"> </td> </tr> </tbody> </table> 

with the following code: 使用以下代码:

$file = "http://www.sportsporudy.gov.ua/catalog/#c[1]=1";
$doc = new DOMDocument();

libxml_use_internal_errors(true);
$doc->loadHTMLFile($file);
libxml_clear_errors();

$xpath = new DOMXPath($doc);
$query = '//tr[@class="odd-row"]';


$elements = $xpath->query($query);
printf("Size of array: %d\n", sizeof($elements));
printElements($elements);

and tried using different queries like //table[@class="object-table"]/tbody/tr ... but doesn't seem to give me the td tags I need. 并尝试使用不同的查询,例如//table[@class="object-table"]/tbody/tr ...,但似乎没有给我我想要的td标签。 Maybe that's because of the broken HTML. 也许是因为HTML损坏了。

Thanks for your advice. 谢谢你的建议。

Substantially, your code is fine. 基本上,您的代码很好。

The only error that I've found is in the printing $elements length: $elements is not an array, to retrieve its length you have to use this syntax: 我发现的唯一错误是打印$elements长度: $elements不是数组,要检索其长度,您必须使用以下语法:

printf( "Size of array: %d\n", $elements->length );

But the major problem that you have with your page is that the HTML has only one table with one row: the remaining data are filled with javascript, so you can't retrieve it directly through DOMXPath. 但是您页面的主要问题是HTML仅具有一张表且每一行:剩余的数据都填充有javascript,因此您不能直接通过DOMXPath检索它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM