I want to extract table from html page which contains nested html table tags after that I want to extract <td>
and <tr>
of tables.
I am using this. Its working fine for <b>
and </b>
$file = file_get_contents($url);
preg_match_all ("/<b>(.*)<\/b>/U", $file, $pat_array);
print $pat_array[0][0]." <br> ".$pat_array[0][1]."\n";
Can anybody tell me regular expression for nested <table (some table properties)>
some data using <tr>
and <td> </table>
. Please keep the href if present in the <tr>
or <td>
fields, and keep in mind the needed tables.
Example:
$file = "<html> <head> <title> asdf </title> </head> <body bgcolor = red > <table border = 1> <table bgcolor = white> (some tr and td data > </table> </table></body> </body> </html>"
preg_match_all ("regular expression for table tag", $file, $pat_array);
print $pat_array[0][0]." <br> ".$pat_array[0][1]."\n";
Update 1 :
When I tried below code it shows the error:
Notice: Undefined offset: 0 in C:\\xampp\\htdocs\\testphp\\tabledata.php on line 27
Code:
$file = file_get_contents($url);
$pat_array = Array();
preg_match_all ("/<tr>(.*)<\/tr>/U", $file, $pat_array);
print $pat_array[1][0];
Can anybody help me regarding this error also?
Don't try to parse HTML with regex, use DOMDocument
and DOMXpath
instead.
$dom = new DOMDocument();
$dom->loadHtml($file);
$xpath = new DOMXpath($dom);
$tableNodes = $xpath->query('//table'); // select all table nodes
// do something, e.g. print node content
foreach ($tableNodes as $tableNode) {
print $tableNode->nodeValue;
}
There are a lot more query options which you can perform with xpath, have a look here . Also you propably want to do something else with the selected nodes than just printing the content. If you are looking for the sub DOM of each table, try this:
foreach ($tableNodes as $tableNode) {
$newDom = new DOMDocument();
$clone = $tableNode->cloneNode(true);
$clone = $newDom->importNode($clone, true);
$newDom->appendChild($clone);
$html = $newDom->saveHTML();
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.