简体   繁体   English

使用PHP函数preg_match_all提取正则表达式html表

[英]Regular expression html table extract using PHP function preg_match_all

I want to extract table from html page which contains nested html table tags after that I want to extract <td> and <tr> of tables. 我想从包含嵌套html表标签的html页中提取表,之后再提取表的<td><tr>

I am using this. 我正在用这个。 Its working fine for <b> and </b> 对于<b></b>

$file = file_get_contents($url);
preg_match_all ("/<b>(.*)<\/b>/U", $file, $pat_array);
print $pat_array[0][0]." <br> ".$pat_array[0][1]."\n";

Can anybody tell me regular expression for nested <table (some table properties)> some data using <tr> and <td> </table> . 任何人都可以使用<tr><td> </table>告诉我嵌套<table (some table properties)>某些数据的正则表达式。 Please keep the href if present in the <tr> or <td> fields, and keep in mind the needed tables. 如果<tr><td>字段中存在href,请保持href,并记住所需的表。

Example: 例:

$file = "<html> <head> <title> asdf </title> </head> <body bgcolor = red >  <table border = 1> <table bgcolor = white> (some tr and td data > </table> </table></body> </body> </html>"

preg_match_all ("regular expression for table tag", $file, $pat_array);
print $pat_array[0][0]." <br> ".$pat_array[0][1]."\n";

Update 1 : 更新1:

When I tried below code it shows the error: 当我尝试下面的代码时,它显示错误:

Notice: Undefined offset: 0 in C:\\xampp\\htdocs\\testphp\\tabledata.php on line 27 注意:第27行的C:\\ xampp \\ htdocs \\ testphp \\ tabledata.php中的未定义偏移量:0

Code: 码:

$file = file_get_contents($url);
$pat_array = Array();
preg_match_all ("/<tr>(.*)<\/tr>/U", $file, $pat_array);
print $pat_array[1][0];

Can anybody help me regarding this error also? 有人可以针对这个错误提供帮助吗?

Don't try to parse HTML with regex, use DOMDocument and DOMXpath instead. 不要尝试使用正则表达式解析HTML,而应使用DOMDocumentDOMXpath

$dom = new DOMDocument();
$dom->loadHtml($file);

$xpath = new DOMXpath($dom);
$tableNodes = $xpath->query('//table'); // select all table nodes

// do something, e.g. print node content
foreach ($tableNodes as $tableNode) {
    print $tableNode->nodeValue;
}

There are a lot more query options which you can perform with xpath, have a look here . 您可以使用xpath执行更多查询选项,请在此处查看 Also you propably want to do something else with the selected nodes than just printing the content. 另外,您可能还想对选定的节点执行其他操作,而不只是打印内容。 If you are looking for the sub DOM of each table, try this: 如果要查找每个表的子DOM,请尝试以下操作:

foreach ($tableNodes as $tableNode) {
    $newDom = new DOMDocument();
    $clone = $tableNode->cloneNode(true);
    $clone = $newDom->importNode($clone, true);
    $newDom->appendChild($clone);

    $html = $newDom->saveHTML();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM