简体   繁体   English

使用simpleHTML删除嵌套标记

[英]Removing nested tags with simpleHTML

I'm trying to use simple_html_dom to remove all the spans from a snippet of HTML, and I'm using the following: 我正在尝试使用simple_html_dom从HTML片段中删除所有跨度,我使用以下内容:

$body = "<span class='outer' style='background:red'>x<span class='mid' style='background:purple'>y<span class='inner' style='background:orange'>z</span></span></span>";
$HTML = new simple_html_dom;
$HTML->load($body);   
$spans = $HTML->find('span');
foreach($spans as $span_tag) {
    echo "working on ". $span_tag->class . " ... ";
    echo "setting " . $span_tag->outertext . " equal to " . $span_tag->innertext . "<br/>\n";
    $span_tag->outertext = (string)$span_tag->innertext;
}
$text =  $HTML->save();
$HTML->clear();
unset($HTML);
echo "<br/>The Cleaned TEXT is: $text<br/>";

And here's the result in my browser: 这是我的浏览器中的结果:

http://www.pixeloution.com/RAC/clean.gif http://www.pixeloution.com/RAC/clean.gif

So why is it I'm only ending up with the outer most span removed? 那么为什么我最终只删除了最外面的跨度?

Edit 编辑

Actually if there's an easier way to do this, I'm game. 实际上,如果有更简单的方法,我就是游戏。 The object is to remove the tags but keep anything inside them including other tags, or else I'd just use $obj->paintext 对象是删除标签但保留其中的任何内容,包括其他标签,否则我只使用$ obj-> paintext

Edit #2 编辑#2

Okay ... apparently I got it working, although oddly enough I'd still like to actually understand the problem if anyone ran into this before. 好吧......显然我已经开始工作,虽然奇怪的是,如果有人遇到这个问题,我仍然希望真正理解这个问题。 Knowing it was only removing the outermost span, I did this: 知道它只是移除了最外层的跨度,我这样做了:

function cleanSpansRecursive(&$body) {

    $HTML = new simple_html_dom;
    $HTML->load($body); 
    $spans = $HTML->find('span');
    foreach($spans as $span_tag) {
        $span_tag->outertext = (string)$span_tag->innertext;
    }

    $body =  (string)$HTML;
    if($HTML->find('span')) {
        $HTML->clear();
        unset($HTML);
        cleanSpansRecursive($body);
    } else {
        $HTML->clear();
        unset($HTML);
    }  
}

And it seems to work. 它似乎工作。

I don't have simple_html_dom installed on my machine or dev server so I can't test, but from the looks of it, setting $span_tag->outertext will create new span objects inside the outer span, so the old references will no longer exist in $HTML . 我没有在我的机器或开发服务器上安装simple_html_dom ,所以我无法测试,但从外观来看,设置$span_tag->outertext$span_tag->outertext内创建新的span对象,因此旧的引用将不再存在于$HTML Going from innermost to outer should fix it since the references would be kept intact. 从最里面到外面应该修复它,因为引用将保持不变。

EDIT: In your second edit, you are finding the newly-created spans every time you do a replacement, which is why it works. 编辑:在第二次编辑中,每次进行替换时都会发现新创建的跨距,这就是它的工作原理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM