使用PHP從div類提取所有內容（包括HTML）

Question

HTML示例...

<html>
<head></head>
<body>
<table>
<tr>
    <td class="rsheader"><b>Header Content</b></td>
</tr>
<tr>
    <td class="rstext">Some text (Most likely will contain lots of HTML</td>
</tr>
</table>
</body>
</html>

我需要將HTML頁面轉換為該HTML頁面的模板版本。 HTML頁面由幾個框組成，每個框都有一個標題（在上面的代碼中稱為“ rsheader”）和一些文本（在上面的代碼中稱為“ rstext”）。

我正在嘗試編寫一個PHP腳本來檢索HTML頁面（也許使用file_get_contents），然后提取rsheader和rstext div中的任何內容。 基本上我不知道該怎么做！ 我嘗試過嘗試DOM，但我不太了解它，盡管我確實設法提取了文本，但它忽略了任何HTML。

我的PHP ...

<?php

$html = '<html>
<head></head>
<body>
<table>
<tr>
    <td class="rsheader"><b>Header Content</b></td>
</tr>
<tr>
    <td class="rstext">Some text (Most likely will contain lots of HTML</td>
</tr>
</table>
</body>
</html>';

$dom = new DomDocument();
$dom->loadHtml($html);
$xpath = new DomXpath($dom);
$div = $xpath->query('//*[@class="rsheader"]')->item(0);
echo $div->textContent;

?>

如果我執行print_r（$ div），我會看到這個...

DOMElement Object
    (
    [tagName] => td
    [schemaTypeInfo] => 
    [nodeName] => td
    [nodeValue] => Header Content
    [nodeType] => 1
    [parentNode] => (object value omitted)
    [childNodes] => (object value omitted)
    [firstChild] => (object value omitted)
    [lastChild] => (object value omitted)
    [previousSibling] => 
    [nextSibling] => (object value omitted)
    [attributes] => (object value omitted)
    [ownerDocument] => (object value omitted)
    [namespaceURI] => 
    [prefix] => 
    [localName] => td
    [baseURI] => 
    [textContent] => Header Content
    )

如您所見，textContent節點內沒有HTML標記，這讓我相信我正在以錯誤的方式進行操作:(

真的希望有人可以為此提供一些幫助...

提前致謝

保羅

Answer 1

X-Path可能比此任務所需的大錘更多。 我會嘗試改用DOMDocument的getElementById（）方法。 下面是一個示例，該示例改編自本文。

注意：已更新為使用標簽和類名，而不是元素ID。

function getChildHtml( $node ) 
{
    $innerHtml= '';
    $children = $node->childNodes;

    foreach( $children as $child )
    {
        $innerHtml .= sprintf( '%s%s', $innerHtml, $child->ownerDocument->saveXML( $child ) );
    }

    return $innerHtml;
}

$dom = new DomDocument();
$dom->loadHtml( $html );

// Gather all table cells in the document.
$cells = $dom->getElementsByTagName( 'td' );

// Loop through the collected table cells looking for those of class 'rsheader' or 'rstext'.
foreach( $cells as $cell )
{
    if( $cell->getAttribute( 'class' ) == 'rsheader' )
    {
        $headerHtml = getChildHtml( $cell );
        // Do something with header html.
    }

    if( $cell->getAttribute( 'class' ) == 'rstext' )
    {
        $textHtml = getChildHtml( $cell );
        // Do something with text html.
    }
}

Answer 2

查看此答案並將其用作准則：從網站檢索特定數據

如果您需要詳細的幫助，我會在這里為您提供幫助。

使用PHP從div類提取所有內容（包括HTML）

問題描述

2 個解決方案

解決方案1
2 已采納 2013-02-21 15:18:34

解決方案2
0 2013-02-21 15:14:50

使用PHP從div類提取所有內容（包括HTML）

問題描述

2 個解決方案

解決方案1 2 已采納 2013-02-21 15:18:34

解決方案2 0 2013-02-21 15:14:50

解決方案1
2 已采納 2013-02-21 15:18:34

解決方案2
0 2013-02-21 15:14:50