简体   繁体   中英

parsing html table tr tag missing

I need to parse a html table using php. But after 1st record, last few records does not have starting <tr> tag. Below is the code:

<tr class="odd">
    <td class="dragHandle"></td>
    <td class="checkbox"></td>
    <td>4228651391</td>
    <td>Payment</td>
    <td>01850147130</td>
    <td>01670808080</td>
    <td>10</td>
    <td>lcghs786</td>
    <td>1</td>
    <td>18-feb-16 21:37:52</td>
</tr>
    <td class="dragHandle"></td>
    <td class="checkbox"></td>
    <td>4226429613</td>
    <td>Payment</td>
    <td>01957814120</td>
    <td>01670808080</td>
    <td>5</td>
    <td>aims777</td>
    <td>1</td>
    <td>18-feb-16 17:44:12</td>
</tr>
    <td class="dragHandle"></td>
    <td class="checkbox"></td>
    <td>4226292073</td>
    <td>Payment</td>
    <td>01957814120</td>
    <td>01670808080</td>
    <td>10</td>
    <td>AIMS786</td>
    <td>1</td>
    <td>18-feb-16 17:28:02</td>
</tr>

I tried with simple_html_dom library but it only return array for the first record. Please help me how to parse all the records and put in an array . Thanks

Firstly, you need fill missing tr to your html by this library.

http://htmlpurifier.org/

Then using code below

$content = str_get_html(your html);
$tr_array = $content->find('tr');
foreach($tr_array as $tr) {
   //process your tr data
};

At last I could solve the problem. Thanks for the hint from @Kelvin

I have taken the faulty static html page [output.html] and fed it to html fixing application called 'tidy' . For parsing th data to a PHP 'array' I used table2arr library by Wojtek Jarzecki in phpclasses.org.

Corrected working code as below.


require_once 'table2arr.php';

$string=file_get_contents('output.html');

shell_exec("tidy.exe output.html > test.html");

$clean_html=file_get_contents('test.html');

$g= new table2arr($clean_html);

$cnt=$g->tablecount;

for($i=0;$i

$g->getcells($i);

var_dump($g->cells); }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM