简体   繁体   English

简单的 HTML Dom:如何删除元素?

[英]Simple HTML Dom: How to remove elements?

I would like to use Simple HTML DOM to remove all images in an article so I can easily create a small snippet of text for a news ticker but I haven't figured out how to remove elements with it.我想使用简单的 HTML DOM 来删除文章中的所有图像,这样我就可以轻松地为新闻代码创建一小段文本,但我还没有弄清楚如何使用它来删除元素。

Basically I would do基本上我会做

  1. Get content as HTML string以 HTML 字符串形式获取内容
  2. Remove all image tags from content从内容中删除所有图像标签
  3. Limit content to x words将内容限制为 x 个单词
  4. Output.输出。

Any help?有什么帮助吗?

There is no dedicated methods for removing elements.没有用于删除元素的专用方法。 You just find all the img elements and then do您只需找到所有 img 元素,然后执行

$e->outertext = '';

when you only delete the outer text you delete the HTML content itself, but if you perform another find on the same elements it will appear in the result.当您只删除外部文本时,您会删除 HTML 内容本身,但如果您对相同的元素执行另一个查找,它将出现在结果中。 the reason is that the simple HTML DOM object still has it's internal structure of the element, only without its actual content.原因是简单的 HTML DOM 对象仍然具有元素的内部结构,只是没有其实际内容。 what you need to do in order to really delete the element is simply reload the HTML as string to the same variable.为了真正删除元素,您需要做的只是将 HTML 作为字符串重新加载到同一个变量。 this way the object will be recreated without the deleted content, and the simple HTML DOM object will be built without it.这样,对象将在没有删除内容的情况下重新创建,而简单的 HTML DOM 对象将在没有它的情况下构建。

here is an example function:这是一个示例函数:

public function removeNode($selector)
{
    foreach ($this->find($selector) as $node)
    {
        $node->outertext = '';
    }

    $this->load($this->save());        
}

put this function inside the simple_html_dom class and you're good.把这个函数放在 simple_html_dom 类中,你就很好了。

I think you have some difficulties because you forgot to save(dump the internal DOM tree back into string).我认为您遇到了一些困难,因为您忘记了保存(将内部 DOM 树转储回字符串)。

Try this:尝试这个:

$html = file_get_html("http://example.com");

foreach($html ->find('img') as $item) {
    $item->outertext = '';
    }

$html->save();

echo $html;

I could not figure out where to put the function so I just put the following directly in my code:我不知道该把函数放在哪里,所以我直接在我的代码中输入了以下内容:

$html->load($html->save());

It basically locks changes made in the for loop back into the html per above.它基本上将 for 循环中所做的更改锁定回上述 html 中。

The supposed solutions are quite expensive and practically unusable in a big loop or other kind of repetition.假定的解决方案非常昂贵,并且在大循环或其他类型的重复中实际上无法使用。

I prefer to use "soft deletes":我更喜欢使用“软删除”:

foreach($html->find('somecondition'),$item){
    if (somecheck) $item->setAttribute('softDelete', true); //<= set marker to check in further code
    $item->outertext='';


   foreach($foo as $bar){
       if(!baz->getAttribute('softDelete'){
           //do something 
        }
    }

}

This is working for me:这对我有用:

foreach($html->find('element') as $element){
   $element = NULL;
}

Adding new answer since removeNode is definitely a better way of removing it:添加新答案,因为removeNode绝对是删除它的更好方法:

$html->removeNode('img');

This method probably was not available when accepted answer was marked.标记接受的答案时,此方法可能不可用。 You do not need to loop the html to find each one, this will remove them.您不需要循环 html 来查找每一个,这将删除它们。

Use outerhtml instead of outertext使用outerhtml代替outertext

<div id='your_div'>the contents of your div</div>

$your_div->outertext = '';
echo $your_div // echoes <div id='your_div'></div>

$your_div->outerhtml= '';
echo $your_div // echoes nothing

Try this:尝试这个:

$dom = new Dom();
$dom->loadStr($text);
foreach ($dom->find('element') as $element) {
   $element->delete();
}

This works now:这现在有效:

$element->remove();

You can see the documentation for the method here .您可以在此处查看该方法的文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM