簡體   English   中英

html 解析與 php DOMDocument

[英]html parsing with php DOMDocument

我正在嘗試從論壇中提取內容,如果主題超過一頁,我想獲取所有主題鏈接,這是主題格式:

<td align="left">
<div class="topicos">
<a href="/_t_1593901" title="Welcome2">
<span class="titulo">
Hello World!
</span>
</a><br>
</div>
</td>

如果它有超過一頁,這是主題格式:

<td align="left">
<div class="topicos">
<a href="_t_1594517" title="Welcome">
<span class="titulo">
Hello World!
</span>
</a><br>
</div>
<span class="quickPaging">
[<img src="http://forum.imguol.com//forum/themes/jogos/images/clear.gif" class="master-sprite sprite-icon-minipost" alt="Ir à página" title="Ir à página">
Ir à página:
<a href="/_t_1594517?&amp;page=1">1</a>,&nbsp;
<a href="/_t_1594517?&amp;page=2">2</a>,&nbsp;
<a href="/_t_1594517?&amp;page=3">3</a>,&nbsp;
<a href="/_t_1594517?&amp;page=4">4</a>,&nbsp;
<a href="/_t_1594517?&amp;page=5">5</a>&nbsp;
]</span>
</td>

我想獲得 5 頁或更多頁的主題的 id(_t_1594517),我該怎么做? 這就是我所厭倦的,但我迷路了,我不太了解 DOMDocument 文檔,我是編程和 PHP 的新手,求助:

<?php
$html = new DOMDocument();
$url = "http://website.com/forum/?page=";
$page = "1";
while($page <= 10)
{
$html->loadHTML($url + $page);

foreach($html->getElementsByTagName('td') as $td)
{
    if($td->hasAttributes())
    {
        if($td->getAttribute('align') == "left")
        {
            $div = $td->getElementsByTagName('div');
            if($div->hasAttributes())
            {
                if($td->getAttribute('class') == "topicos")
                {
                    $a = $td->getElementsByTagName('a');
                    {
                        if($a->hasAttributes())
                        {
                            /*$return['link'][] =*/ echo $a->getElementById('href')->tagName;
                        }
                    }
                }
            }
        }
    }
}   
}
?>

我認為xpath可以幫助你:

如果$with_links包含帶有 5 個鏈接的 HTML 內容,那么

$doc = new DOMDocument();
$doc->loadHTML($with_links);
$xpath = new DOMXPath($doc);

$quick_paging_links = $xpath->query('//span[@class="quickPaging"]/a[contains(@href,"_t_")]/@href');
if($quick_paging_links->length>4)
{
  $first_href = $quick_paging_links->item(0)->value;
  $id = substr($first_href, 1, strpos($first_href, '?')-1);
  echo 'Topic with id '.$id.' has '.$quick_paging_links->length." links.\n";
}

將產生 output:

Topic with id _t_1594517 has 5 links.

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM