使用PHP函數preg_match_all提取正則表達式html表

Question

我想從包含嵌套html表標簽的html頁中提取表，之后再提取表的<td>和<tr> 。

我正在用這個。 對於<b>和</b>

$file = file_get_contents($url);
preg_match_all ("/<b>(.*)<\/b>/U", $file, $pat_array);
print $pat_array[0][0]." <br> ".$pat_array[0][1]."\n";

任何人都可以使用<tr>和<td> </table>告訴我嵌套<table (some table properties)>某些數據的正則表達式。 如果<tr>或<td>字段中存在href，請保持href，並記住所需的表。

例：

$file = "<html> <head> <title> asdf </title> </head> <body bgcolor = red >  <table border = 1> <table bgcolor = white> (some tr and td data > </table> </table></body> </body> </html>"

preg_match_all ("regular expression for table tag", $file, $pat_array);
print $pat_array[0][0]." <br> ".$pat_array[0][1]."\n";

更新1：

當我嘗試下面的代碼時，它顯示錯誤：

注意：第27行的C：\\ xampp \\ htdocs \\ testphp \\ tabledata.php中的未定義偏移量：0

碼：

$file = file_get_contents($url);
$pat_array = Array();
preg_match_all ("/<tr>(.*)<\/tr>/U", $file, $pat_array);
print $pat_array[1][0];

有人可以針對這個錯誤提供幫助嗎？

Answer 1

不要嘗試使用正則表達式解析HTML，而應使用DOMDocument和DOMXpath 。

$dom = new DOMDocument();
$dom->loadHtml($file);

$xpath = new DOMXpath($dom);
$tableNodes = $xpath->query('//table'); // select all table nodes

// do something, e.g. print node content
foreach ($tableNodes as $tableNode) {
    print $tableNode->nodeValue;
}

您可以使用xpath執行更多查詢選項，請在此處查看。 另外，您可能還想對選定的節點執行其他操作，而不只是打印內容。 如果要查找每個表的子DOM，請嘗試以下操作：

foreach ($tableNodes as $tableNode) {
    $newDom = new DOMDocument();
    $clone = $tableNode->cloneNode(true);
    $clone = $newDom->importNode($clone, true);
    $newDom->appendChild($clone);

    $html = $newDom->saveHTML();
}

使用PHP函數preg_match_all提取正則表達式html表

問題描述

1 個解決方案

解決方案1
1 已采納 2014-11-17 08:43:13

使用PHP函數preg_match_all提取正則表達式html表

問題描述

1 個解決方案

解決方案1 1 已采納 2014-11-17 08:43:13

解決方案1
1 已采納 2014-11-17 08:43:13