简体   繁体   中英

Remove everything except image tag from string using regular expression

I have string that contains all the html elements , i have to remove everything except images .

Currently i am using this code

$e->outertext = "<p class='images'>".str_replace(' ', ' ', str_replace('Â','',preg_replace('/#.*?(<img.+?>).*?#is', '',$e)))."</p>";

Its serving my purpose but very slow in execution . Any other way to do the same would be appreciable .

The code you provided seems to not work as it should and even the regex is malformed. You should remove the initial slash / like this: #.*?(<img.+?>).*?#is .

Your mindset is to remove everything and leave just the image tags, this is not a good way to do it. A better way is to think in just capturing all image tags and then using the matches to construct the output. First let's capture the image tags. That can be done using this regex:

/<img.*>/Ug

The U flag makes the regex engine become lazy instead of eager, so it will match the encounter of the first > it finds.

DEMO1

Now in order to construct the output let's use the method preg_match_all and put the results in a string. That can be done using the following code:

<?php
// defining the input
$e = 
'<div class="topbar-links"><div class="gravatar-wrapper-24">
<img src="https://www.gravatar.com/avatar" alt="" width="24" height="24"     class="avatar-me js-avatar-me">
</div>
</div> <img test2> <img test3> <img test4>';
// defining the regex
$re = "/<img.*>/U";
// put all matches into $matches
preg_match_all($re, $e, $matches);
// start creating the result
$result = "<p class='images'>";
// loop to get all the images
for($i=0; $i<count($matches[0]); $i++) {
    $result .= $matches[0][$i];
}
// print the final result
echo $result."</p>";

DEMO2

A further way to improve that code is to use functional programming ( array_reduce for example). But I'll leave that as a homework.

Note : There is another way to accomplish this which is parsing the html document and using XPath to find the elements. Check out this answer for more information.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM