去掉 <html> 和 <head> DOMDocument :: saveXML上的标签

Question

I have a part of html that is incompletely structured. 我有一部分结构不完整的html。 Example: 例：

<div id='notrequired'>
    <div>
        <h3>Some examples :-)</h3>
        STL is a library, not a framework.
    </div> 
    </p>
    </a>
    <a target='_blank' href='http://en.wikipedia.org/wiki/Library_%28computing%29'>Read more</a>;
</div>
<a target='_blank' href='http://en.wikipedia.org/wiki/Library_%28computing%29'>Read more</a>";

As you can notice here I have unexpected </p> and </a> tags. 你可以在这里注意到我有意想不到的</p>和</a>标签。

I tried a snippet of code to remove the <div id='notrequired'> and it works, but unable to handle it precisely. 我尝试了一段代码来删除<div id='notrequired'>并且它可以工作，但无法准确处理它。

Here's the snippet code: 这是代码段：

function DOMRemove(DOMNode $from) {
            $from->parentNode->removeChild($from);
        }

        $dom = new DOMDocument();
        @$dom->loadHTML($text); //$text contains the above mentioned HTML

        $selection = $dom->getElementById('notrequired');
        if($selection == NULL){
            $text = $dom->saveXML();
        }else{
            $refine = DOMRemove($selection);
            $text = $dom->saveXML($refine);
        }

The problem is $dom->saveXML saves as HTML content: 问题是$dom->saveXML保存为HTML内容：

       <?xml version="1.0" standalone="yes"?>
        <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
        <html>

<body>
            <a target="_blank" href="http://en.wikipedia.org/wiki/Library_%28computing%29">Read more</a>

    </body>    
    </html>

All I only need is: 我只需要：

<a target='_blank' href='http://en.wikipedia.org/wiki/Library_%28computing%29'>Read more</a>

And not the <HTML> and <BODY> tags. 而不是<HTML>和<BODY>标签。

What am I missing? 我错过了什么？ Any other way of doing it better? 任何其他方式做得更好？

Answer 1

Ok.. I guess I figured out a solution. 好的..我想我找到了解决方案。 Approach may not be right, but, it does the job! 方法可能不对，但是，它完成了工作！

As Hakre pointed out that it's the exact duplicate as innerHTML in PHP's DomDocument? 正如Hakre指出的那样，它与PHP的DomDocument中的innerHTML完全相同？ , It is not exact duplicate but it gave me a hint to use the idea. ，这不是完全重复，但它给了我一个暗示使用这个想法。 Thanks for suggestion. 谢谢你的建议。

It helped me frame a solution below: 它帮助我构建了以下解决方案：

function DOMRemove(DOMNode $from) {
    $from->parentNode->removeChild($from);
}

function DOMinnerHTML($element) 
{ echo "Ashwin";
    $innerHTML = ""; 
    $children = $element->childNodes; 
    foreach ($children as $child) 
    { 
        $tmp_dom = new DOMDocument(); 
        $tmp_dom->appendChild($tmp_dom->importNode($child, true)); 
        $innerHTML.=trim($tmp_dom->saveHTML()); 
    }
    return $innerHTML; 
}

$dom = new DOMDocument();
$dom->preserveWhiteSpace = false; 
@$dom->loadHTML($test);

$a = $dom->getElementById('step');

$b = DOMRemove($a);
$c = $dom->saveXML($b);

$domTable = $dom->getElementsByTagName("body"); 

foreach ($domTable as $tables) 
{ 
    $x = DOMinnerHTML($tables); 
    echo $x; 
}

If the input is: 如果输入是：

<div id='step'>
    <div >
        <h3>Some examples :-(</h3>
        Blah blah blah...
    </div> </p>
    </a>
    <a target='_blank' href='#'>Read more</a>;
</div>
<div id='step2'>
    <div>
        <h3>Some examples :-) :-D</h3>
        Blah2 blah2 blah2...
    </div> </p> </a>
</div>
<a target='_blank' href='#'>Read more</a>
<a target='_blank' href='#'>Read more</a>
<a target='_blank' href='#'>Read more</a>

The output, as expected, is: 正如预期的那样，输出是：

<div id="step2">
    <div>
        <h3>Some examples :-) :-D</h3>
        Blah2 blah2 blah2...
    </div> 
</div>
<a target="_blank" href="#">Read more</a>
<a target="_blank" href="#">Read more</a>
<a target="_blank" href="#">Read more</a>

The solution works but may not optimal. 解决方案有效但可能不是最佳的。 Any thoughts? 有什么想法吗？

去掉 <html> 和 <head> DOMDocument :: saveXML上的标签

问题描述

1 个解决方案

解决方案1
2 已采纳 2012-10-10 09:55:30

去掉 <html> 和 <head> DOMDocument :: saveXML上的标签

问题描述

1 个解决方案

解决方案1 2 已采纳 2012-10-10 09:55:30

解决方案1
2 已采纳 2012-10-10 09:55:30