简体   繁体   English

去掉 </div> 来自PHP中字符串的HTML标记

[英]Remove </div> HTML tag from a string in PHP

According to the post here , the code below can remove the HTML tag, such as <div> . 根据此处的帖子,下面的代码可以删除HTML标记,例如<div> But I found that the end tag </div> still remain in the string. 但是我发现结束标记</div>仍然保留在字符串中。

$content = "<div id=\"header\">this is something with an <img src=\"test.png\"/> in it.</div>";
$content = preg_replace("/<div[^>]+\>/i", "", $content); 
echo $content;

I have tried something below, but still not work, how can I fix this issue? 我在下面尝试了一些方法,但是仍然无法正常工作,如何解决此问题?

$content = preg_replace("/<\/div[^>]+\>/i", "", $content); 
$content = preg_replace("/<(/)div[^>]+\>/i", "", $content); 

Thanks 谢谢

The end tag doesn't have anything between the div and the > , so instead try something like: end标签在div和>之间没有任何内容,因此请尝试如下操作:

$content = preg_replace("/<\/?div[^>]*\>/i", "", $content); 

This will remove patterns of the form: 这将删除以下形式的模式:

<div>
</div>
<div class=...>

将其更改为"/<[\\/]*div[^>]*>/i"

If you can guarantee the HTML being passed in will be valid and structured in a certain way you should be OK with regex. 如果您可以确保传入的HTML以某种方式有效且结构合理,则可以使用regex。

In general, though, it's best to avoid using regex for working with HTML, because the markup can be so varied and messy. 通常,尽管如此,最好避免使用正则表达式来处理HTML,因为标记可能是如此多样且混乱。 Instead, try using a library like DOMDocument - it handles all the messiness for you. 相反,请尝试使用像DOMDocument这样的库-它可以为您处理所有混乱情况。

With DOMDocument you would do something like: 使用DOMDocument,您可以执行以下操作:

$doc = new DOMDocument;
$doc->loadHTML($html);
$headerElement = $doc->getElementById('header');
$headerElement->parentNode->removeChild($headerElement);
$amendedHtml = $doc->saveHTML(); 
$content = preg_replace("/<\/?(div|b|span)[^>]*\>/i", "", $content); 

remove all 移除所有

<div...>
</div>
<b....>
</b>
<span...>
</span>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM