How to strip all img tags from a text, except for those containing a certain word

Question

I would like to strip all img tags from a certain text, except for those which contain a certain keyword (eg the domain they're hosted at).

Here's what I've come up with, but I'm afraid it doesn't work:

 $text = preg_replace("/<img[^>]+(?!keyword)[^>]+\>/i", "", $text);

Any help would be GREATLY appreciated! :)

Answer 1

Use DOMDocument::loadHTML ? It uses libxml under the hood which is fast and robust.

Don't try to parse HTML with regex's.

I made that bold because I see it a lot on here and the solutions are always fragile at best and buggy at worst. Once you use a true HTML parser to get the attributes you want then using a regex is more reasonable.

[update] - Even if this is coming from Wordpress you should be fine since it takes a string as an argument.

The function parses the HTML contained in the string source. Unlike loading XML, HTML does not have to be well-formed to load.

Something like the following should get you going...

$doc = new DOMDocument();
$doc->loadHTML($var);
$images = $doc->getElementsByTagName('img');

Answer 2

Use a callback to simplify the task:

$html = preg_replace_callback('/<img\s[^>]+>/i', "cb_keyword", $html);

function cb_keyword($matches) {  // return empty str or original text
    return !strpos($matches[0], "keyword") ? "" : $matches[0];
}

If you are working on HTML snippets using phpQuery/QueryPath would still be possible, but adds more post-processing.

How to strip all img tags from a text, except for those containing a certain word

Question

2 answers

solution1
5 2011-03-04 23:09:26

solution2
0 ACCPTED 2011-03-04 23:33:31

How to strip all img tags from a text, except for those containing a certain word

Question

2 answers

solution1 5 2011-03-04 23:09:26

solution2 0 ACCPTED 2011-03-04 23:33:31

solution1
5 2011-03-04 23:09:26

solution2
0 ACCPTED 2011-03-04 23:33:31