将简单的 HTML DOM 空间转化为类

Question

I'm using Simple HTML DOM to get elements from a website, but when class attribute has spaces, I don't get anything.我正在使用简单的 HTML DOM 从网站获取元素，但是当 class 属性有空格时，我什么也得不到。

Source HTML from betaexplorer.com来自betaexplorer.com 的源 HTML

<table id="table-type-2" class="stats-table stats-main table-2">
    <tbody>
    <tr class="odd glib-participant-ppjDR086" data-def-order="0">
        <td class="rank col_rank no" title="">1.</td>
        <td class="participant_name col_participant_name col_name"><span class="team_name_span"><a onclick="javascript:getUrlByWinType('/soccer/england/premier-league/teaminfo.php?team_id=ppjDR086');">Manchester United</a></span></td>
        <td class="matches_played col_matches_played">4</td>
        <td class="wins col_wins">4</td>
        <td class="draws col_draws">0</td>
        <td class="losses col_losses">0</td>
        <td class="goals col_goals">14:0</td>
        <td class="goals col_goals">12</td>
    </tr>
    <tr class="even glib-participant-hA1Zm19f" data-def-order="1">
        <td class="rank col_rank no" title="">2.</td>
        <td class="participant_name col_participant_name col_name"><span class="team_name_span"><a onclick="javascript:getUrlByWinType('/soccer/england/premier-league/teaminfo.php?team_id=hA1Zm19f');">Arsenal</a></span></td>
        <td class="matches_played col_matches_played">4</td>
        <td class="wins col_wins">4</td>
        <td class="draws col_draws">0</td>
        <td class="losses col_losses">0</td>
        <td class="goals col_goals">11:3</td>
        <td class="goals col_goals">12</td>
    </tr>
    <tr class="odd glib-participant-Wtn9Stg0" data-def-order="2">
        <td class="rank col_rank no" title="">3.</td>
        <td class="participant_name col_participant_name col_name"><span class="team_name_span"><a onclick="javascript:getUrlByWinType('/soccer/england/premier-league/teaminfo.php?team_id=Wtn9Stg0');">Manchester City</a></span></td>
        <td class="matches_played col_matches_played">4</td>
        <td class="wins col_wins">3</td>
        <td class="draws col_draws">1</td>
        <td class="losses col_losses">0</td>
        <td class="goals col_goals">18:3</td>
        <td class="goals col_goals">10</td>
    </tr>
    </tbody>
</table>

My PHP code using SimpleHtmlDom我的 PHP 代码使用 SimpleHtmlDom

    <?php
include('../simple_html_dom.php');


function getHTML($url,$timeout)
{
       $ch = curl_init($url); // initialize curl with given url
       curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]); // set  useragent
       curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // write the response to a variable
       curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any
       curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); // max. seconds to execute
       curl_setopt($ch, CURLOPT_FAILONERROR, 1); // stop when it encounters an error
       curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
       curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
       return @curl_exec($ch);
}



$response=getHTML("http://www.betexplorer.com/soccer/england/premier-league/standings/?table=table&table_sub=home&ts=WOO1nDO2&dcheck=0",10);
$html = str_get_html($response);

$team = $html->find("span[class=team_name_span]/a"); 
$numbermatch = $html->find("td.matches_played.col_matches_played"); 
$wins = $html->find("td.wins.col_wins"); 
$draws = $html->find("td.draws.col_draws"); 
$losses = $html->find("td.losses.col_losses"); 
$goals = $html->find("td.goals.col_goals"); 

?>

<table border="1" width="100%">
    <thead>
        <tr>
            <th>Team</th>
            <th>MP</th>
            <th>W</th>
            <th>D</th>
            <th>L</th>
            <th>G</th>
        </tr>
    </thead>

<?php



foreach ($team as $match) {


echo  "<tr>".

            "<td class='first-cell'>".$match->innertext."</td> "  .
            "<td class='first-cell'>".$numbermatch->innertext."</td> "  .
            "<td class='first-cell'>".$wins->innertext."</td> "  .
            "<td class='first-cell'>".$draws->innertext."</td> "  .
            "<td class='first-cell'>".$losses->innertext."</td> "  .
            "<td class='first-cell'>".$goals->innertext."</td> "  .


            "</tr><br/>";



        }       



?>
</table>

So, I only get first value (because class name is without spaces), but I can't get the rest of values所以，我只得到第一个值（因为类名没有空格），但我无法得到其余的值

EDIT: I fixed a mistake into PHP code.编辑：我在 PHP 代码中修正了一个错误。 See again再看

EDIT2: It's not a duplicate, I tried that solution but It doesn't work EDIT2：它不是重复的，我尝试了该解决方案但它不起作用

EDIT3: I tried to use advanced_html_dom (it should fix spaces problem), but I don't get anything (also just the only one I was getting) EDIT3：我尝试使用advanced_html_dom（它应该解决空格问题），但我什么也没得到（也只是我得到的唯一一个）

EDIT4: In the screens below you can see what I'd like to get and what I get right now: EDIT4：在下面的屏幕中，您可以看到我想得到什么以及我现在得到什么：

EDIT5编辑5

team.php团队.php

    <?php

// START team.php 
class Team
{
    public $name, $matches, $wins, $draws, $losses, $goals;

    public static function parseRow($row): ?self
    {
        $result = new self();
        $result->name = $result->parseMatch($row, 'span.team_name_span a');
        if (null === $result->name) {
            return null; // couldn't even match the name, probably not a team row, skip it
        }

        $result->matches = $result->parseMatch($row, 'td.col_matches_played');
        $result->wins = $result->parseMatch($row, 'td.col_wins');
        $result->draws = $result->parseMatch($row, 'td.col_draws');
        $result->losses = $result->parseMatch($row, 'td.col_losses');
        $result->goals = $result->parseMatch($row, 'td.col_goals');

        return $result;
    }

    private function parseMatch($row, $selector)
    {
        if (!empty($match = $row->find($selector, 0))) {
            return $match->innertext;
        }

        return null;
    }
}

// END team.php

?>

clas.php类.php

    <?php

include('../simple_html_dom.php');
include('../team.php');


function getHTML($url,$timeout)
{
       $ch = curl_init($url); // initialize curl with given url
       curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]); // set  useragent
       curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // write the response to a variable
       curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any
       curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); // max. seconds to execute
       curl_setopt($ch, CURLOPT_FAILONERROR, 1); // stop when it encounters an error
       curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
       curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
       return @curl_exec($ch);
}



$response=getHTML("http://www.betexplorer.com/soccer/england/premier-league/standings/?table=table&table_sub=home&ts=WOO1nDO2&dcheck=0",10);
$html = str_get_html($response);



// START DOM parsing block
$teams = [];

foreach($html->find('table.stats-table tr') as $row) {
    $team = Team::parseRow($row); // load the row into a Team object if possible

    // skipp this entry if it couldn't match the row
    if (null !== $team) {
        // what were actually doing here is just the OOP equivalent of:
        // $teams[] = ['name' => $row->find('span.team_name_span a',0)->innertext, ...];
        $teams[] = $team;
    }
}

foreach($teams as $team) {
    echo $team->name;
    echo $team->matches;
}

// END DOM Parsing Block

?>

Answer 1

Solution: http://phpfiddle.org/main/code/cq54-hta2解决方案： http : //phpfiddle.org/main/code/cq54-hta2

Class-names don't have spaces, don't try to match them类名没有空格，不要试图匹配它们

SimpleHtmlDom doesn't support attribute selectors like this. SimpleHtmlDom 不支持这样的属性选择器。 Plus you're tyring to match a class as though it has spaces in the class name.另外，您正在努力匹配一个类，就好像它在类名中有空格一样。 So, instead of this:所以，而不是这个：

$wins = $html->find("td[class=wins col_wins]"); 
$draws = $html->find("td[class=draws col_draws]"); 
$losses = $html->find("td[class=losses col_losses]");

Do the following to match td elements which match BOTH of two class-names:执行以下操作以匹配匹配两个类名的 td 元素：

$wins = $html->find("td.wins.col_wins"); 
$draws = $html->find("td.draws.col_draws"); 
$losses = $html->find("td.losses.col_losses");

Additionally, that HTML markup doesn't require you to match both classes to get the data, should you could simply do:此外，该 HTML 标记不需要您匹配两个类来获取数据，如果您可以简单地执行以下操作：

$wins = $html->find("td.col_wins"); 
$draws = $html->find("td.col_draws"); 
$losses = $html->find("td.col_losses");

Getting repeated selectors (looping through rows).获取重复的选择器（循环遍历行）。

What you are trying to extract is the an array of data from the rows of a table.您要提取的是表行中的数据数组。 More specifically, something that looks like this:更具体地说，看起来像这样的东西：

$teams = [
    ['Arsenal', matches, wins, ...],
    ['Liverpool', matches, wins, ...],
    ...
];

This means you'll need to run the same data-extraction against each row of the table.这意味着您需要对表的每一行运行相同的数据提取。 SimpleHtmlDom makes this easy through jQuery-like find methods, which can be called from any matched element. SimpleHtmlDom 通过类似 jQuery 的find方法使这变得容易，该方法可以从任何匹配的元素调用。

Complete Solution完整的解决方案

This solution actually defines a Team object to load each row's data into.这个解决方案实际上定义了一个Team对象来加载每一行的数据。 Should make future adjustments much simpler.应该使未来的调整更加简单。

The important piece to note here, is that first we loop through every table-row as $row , and collect the team and numbers from $row->find([selector]) .这里要注意的重要一点是，首先我们将每个表格行作为$row循环，然后从$row->find([selector])收集团队和数字。

// START team.php 
class Team
{
    public $name, $matches, $wins, $draws, $losses, $goals;

    public function __construct($row)
    {
        $this->name = $this->parseMatch($row, 'span.team_name_span a');
        if (null === $this->name) {
            return; // couldn't even match the name, probably not a team row, skip it
        }

        $this->matches = $this->parseMatch($row, 'td.col_matches_played');
        $this->wins = $this->parseMatch($row, 'td.col_wins');
        $this->draws = $this->parseMatch($row, 'td.col_draws');
        $this->losses = $this->parseMatch($row, 'td.col_losses');
        $this->goals = $this->parseMatch($row, 'td.col_goals');
    }

    private function parseMatch($row, $selector)
    {
        if (!empty($match = $row->find($selector, 0))) {
            return $match->innertext;
        }

        return null;
    }

    public function isValid()
    {
        return null !== $this->name;
    }

    public function getMatchData() //example
    {
        return "<br><b>". $this->wins .' : '. $this->matches . "</b>";
    }
}

// END team.php

// START DOM parsing block
$teams = [];

foreach($html->find('table.stats-table tr') as $row) {
    $team = new Team($row); // load the row into a Team object if possible

    // skipp this entry if it couldn't match the row
    if ($team->isValid()) {
        // what were actually doing here is just the OOP equivalent of:
        // $teams[] = ['name' => $row->find('span.team_name_span a',0)->innertext, ...];
        $teams[] = $team;
    }
}

foreach($teams as $team) {
    echo "<h1>".$team->name."</h1>";
    echo $team->losses;
    echo $team->getMatchData();
}

// END DOM Parsing Block

将简单的 HTML DOM 空间转化为类

问题描述

Source HTML from betaexplorer.com来自betaexplorer.com 的源 HTML

My PHP code using SimpleHtmlDom我的 PHP 代码使用 SimpleHtmlDom

1 个解决方案

解决方案1
1 已采纳 2017-10-13 16:11:53

Class-names don't have spaces, don't try to match them类名没有空格，不要试图匹配它们

Getting repeated selectors (looping through rows).获取重复的选择器（循环遍历行）。

Complete Solution完整的解决方案

将简单的 HTML DOM 空间转化为类

问题描述

Source HTML from betaexplorer.com来自betaexplorer.com 的源 HTML

My PHP code using SimpleHtmlDom我的 PHP 代码使用 SimpleHtmlDom

1 个解决方案

解决方案1 1 已采纳 2017-10-13 16:11:53

Class-names don't have spaces, don't try to match them类名没有空格，不要试图匹配它们

Getting repeated selectors (looping through rows).获取重复的选择器（循环遍历行）。

Complete Solution完整的解决方案

解决方案1
1 已采纳 2017-10-13 16:11:53