简体   繁体   English

如何从文本中剥离所有img标签,但包含特定单词的标签除外

[英]How to strip all img tags from a text, except for those containing a certain word

I would like to strip all img tags from a certain text, except for those which contain a certain keyword (eg the domain they're hosted at). 我想从某个文本中剥离所有img标签,但那些包含某个关键字的标签(例如,它们所托管的域)除外。

Here's what I've come up with, but I'm afraid it doesn't work: 这是我想出的,但恐怕它不起作用:

 $text = preg_replace("/<img[^>]+(?!keyword)[^>]+\>/i", "", $text); 

Any help would be GREATLY appreciated! 任何帮助将不胜感激! :) :)

Use DOMDocument::loadHTML ? 使用DOMDocument :: loadHTML吗? It uses libxml under the hood which is fast and robust. 它在后台使用libxml,既快速又强大。

Don't try to parse HTML with regex's. 不要试图用正则表达式解析HTML。

I made that bold because I see it a lot on here and the solutions are always fragile at best and buggy at worst. 我之所以大胆,是因为我在这里看到了很多东西,而解决方案总是充其量是脆弱的,最坏的时候是越野车。 Once you use a true HTML parser to get the attributes you want then using a regex is more reasonable. 一旦使用了真正的HTML解析器来获取所需的属性,则使用正则表达式更为合理。

[update] - Even if this is coming from Wordpress you should be fine since it takes a string as an argument. [更新] -即使它来自Wordpress,也可以,因为它将字符串作为参数。

The function parses the HTML contained in the string source. 该函数解析字符串源中包含的HTML。 Unlike loading XML, HTML does not have to be well-formed to load. 与加载XML不同,HTML不必格式正确即可加载。

Something like the following should get you going... 如下所示的内容应该可以帮助您...

$doc = new DOMDocument();
$doc->loadHTML($var);
$images = $doc->getElementsByTagName('img');

Use a callback to simplify the task: 使用回调来简化任务:

$html = preg_replace_callback('/<img\s[^>]+>/i', "cb_keyword", $html);

function cb_keyword($matches) {  // return empty str or original text
    return !strpos($matches[0], "keyword") ? "" : $matches[0];
}

If you are working on HTML snippets using phpQuery/QueryPath would still be possible, but adds more post-processing. 如果您正在使用phpQuery / QueryPath处理HTML代码段,则仍然可以,但是会增加更多的后处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM