简体   繁体   中英

PHP - Parsing html tables via DOM

So I am using the PHP Simple HTML DOM Parser and I am trying to get the table list of Top Goalscorers from this webpage: http://www.transfermarkt.co.uk/en/chinese-super-league/startseite/wettbewerb_CSL.html (it's the top 5...)

I am trying to parse the table Top Goal Scorers and that has the ID of "spieler" . In doing so, I want to get each table row and list them on my own. The problem is... below Name / Club ... there is a new <table> to make the image, name and club name easier to display on a webpage.

I am trying to figure out the DOM so I can see what I need to select and get the right player name, club name and the goals. Thanks.

Here's what I have so far:

<textarea id='txt_out'>
<?php
echo "Player | Team | Goals\n:--|:--|:--:\n";

$url = "http://www.transfermarkt.co.uk/en/chinese-super-league/startseite/wettbewerb_CSL.html";
$html = file_get_html($url);

foreach($html->find('#spieler') as $row) {

  if ($i > 0) {
   $player = $row->find('table tr',3)->plaintext;
        echo $player . "|TEST TEAM|0";
    }
   $i++;
}
?>
</textarea>

and this echo returns blank.

<textarea id="txt_out">Player | Team | Goals
:--|:--|:--:
</textarea>

There you go (you have to play with the attributes a bit to get your desire output): In this solution I just take all the tds and get the plaintext of the them after I checked they don't include the inner table in them.

$output = '<table border="1">
                <tr>
                    <td>#</td>
                    <td>Player</td>
                    <td>Team</td>
                    <td>goals-1</td>
                    <td>goals-2</td>
                    <td>goals-3</td>
                    <td>points</td>
                </tr>
            ';

$url = "http://www.transfermarkt.co.uk/en/chinese-super-league/startseite/wettbewerb_CSL.html";
$html = file_get_html($url);

$tbl = $html->find('#spieler',0);

$trs = $tbl->find('tr[class=dunkel],tr[class=hell]');

foreach($trs as $tr){
    $output .= '<tr>';
    $tds = $tr->find('td');
    foreach($tds as $td){
        $inner_table = $td->find('table',0);
        if(!$inner_table){  
            $text = trim($td->plaintext);
            if($text != ''){
                $output .= '<td>' . $td->plaintext . '</td>';
            }
        }  
    }
    $output .= '</tr>';
}

$output .= '</table>';

echo($output);

使用DOMNodelist-> item()(item()期望索引作为参数,它是从零开始的,所以1将返回第二个表)

 $table = $dom->getElementsByTagName('table')->item(1);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM