简体   繁体   English

使用DOM和XPath解析HTML表

[英]Parse HTML Table with DOM and XPath

I'm trying to parse an HTML Table with XPath. 我正在尝试使用XPath解析HTML表。 The URL is: click here . URL是: 单击此处

I use FireBug to see page's DOM and i understand the container i need. 我使用FireBug查看页面的DOM,并且了解需要的容器。

<tbody>
<tr class="r1">
<td class="l rbrd">
<img class="spr2 sport sp1" align="absmiddle" src="/s.gif">
</td>
<td class="l rbrd">19/4 18:30</td>
<td class="l rbrd">
<a title="CHELSEA FC - SUNDERLAND" href="/chelsea-fc-vs-sunderland/e/4509648/" target="_blank">CHELSEA FC - SUNDERLAND</a>
</td>
<td class="c w40">
<span class="o">1,21</span>
<span class="p">92,8%</span>
</td>
<td class="c w10 rbrd">
<span class="o">
<span class="p">
</td>
<td class="c w40">
<span class="o">8,00</span>
<span class="p">4,7%</span>
</td>
<td class="c w10 rbrd">
<span class="o">
<span class="p">
</td>
<td class="c w40">
<span class="o">18,00</span>
<span class="p">2,5%</span>
</td>
<td class="c w10 rbrd">
<span class="o">
<span class="p">
</td>
<td class="c emph">
<span class="o">353.660 €</span>
</td>
<td class="c w10 emph rbrd">
<img class="imgdiff" width="10" height="10" src="http://img.oxytropis.com/s.gif">
</td>
<td class="c rbrd">
<span class="o">1,56</span>
<span class="p">67,5%</span>
</td>
<td class="c rbrd">
<span class="o">2,74</span>
<span class="p">32,5%</span>
</td>
<td class="c emph rbrd">
<span class="o">6.243 €</span>
</td>
<td class="c rbrd">
<a onclick="_gaq.push(['_trackEvent','betfair','click','tziroi-out']);" href="http://sports.betfair.com/Index.do?mi=&ex=1&origin=MRL&rfr=655" rel="nofollow" target="_blank">
</td>
</tr>

This is only one row, there are hundreds more. 这只有一行,还有数百行。 So we have all rows with informations and we can check every single line and check whether it contains date, match, money etc ... i need to make a condition for each of them, to store all of them in an array. 因此,我们在所有行中都包含了信息,我们可以检查每一行并检查其中是否包含日期,匹配项,金钱等...我需要为它们中的每一个条件,以将它们全部存储在一个数组中。

I follow this tutorial: click here 我遵循本教程: 单击此处

Wich condition i can use to differentiate each cells from another? 我可以用来区分每个单元格的条件?

I want to have something like this for each rows in the table: 我想对表中的每一行都具有这样的内容:

[0] => Array
            (
                [date] => 18:30 19/4
                [teams] => CHELSEA FC - SUNDERLAND
                [1] => 1,21
                [1 volumes] => 92,8%
                [X] => 8,00
                [X volumes] => 4,7%
                [2] => 18,00
                [2 volumes] => 2,5%
                [matched] => 353.660 € 
                  ...

            )

This is the php, i'm blocked at this point: 这是php,目前我被阻止了:

<?php

$curl = curl_init('http://www.oxybet.ro/pariu/external/betfair-volumes.htm');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10');
$html = curl_exec($curl);
curl_close($curl);

if (!$html) {
     die("something's wrong!");
}



$dom = new DOMDocument();
@$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

$scores = array();

$tableRows = $xpath->query('//div//div//div[2]//div/div//table//tr');

foreach ($tableRows as $row) {
    // fetch all 'tds' inside this 'tr'
    $td = $xpath->query('td', $row);
    $match = array();

Your query is fetching all table rows so far. 您的查询正在获取到目前为止的所有表行。 In the next step, loop over these results (in PHP) and access the rows as needed. 在下一步中,遍历这些结果(在PHP中)并根据需要访问行。 You might either want to use direct DOM access or XPath, whatever you prefer. 您可能想要使用直接DOM访问或XPath,无论您喜欢什么。

For using XPath, use an XPath expression that starts querying at the current context, and pass the current row as such. 要使用XPath,请使用XPath表达式,该表达式在当前上下文中开始查询,并照此传递当前行。 Use numerical predicates to limit to the row you're looking for. 使用数字谓词来限制要查找的行。 For example, to query the team name (in the third table cell, XPath counts 1-indexed), use something like 例如,要查询团队名称(在第三个表单元格中,XPath计数为1索引),请使用类似

$tableRows = $xpath->query('//div//div//div[2]//div/div//table//tr');
foreach ($tableRows as $row) {
    $team = $xpath->query('./td[3]/a', $row)->item(0)->textContent;
}

Querying the class attributes might also be possible, but they seem to be used rather arbitrarily. 查询类属性也是可能的,但是似乎相当随意地使用它们。

Now, read the other table rows with similar queries, construct the resulting map and append it to the $scores array. 现在,使用类似的查询读取其他表行,构造结果映射并将其附加到$scores数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM