简体   繁体   中英

PHP find certain character in HTML tag and replace the whole tag by string

I have extracted a string value from my sql table and it is like below:

<p>Commodity Exchange on 5 April 2016 settled as following graph:</p> 
<p><img alt=\"\" src=\"ckeditor/plugins/imageuploader/uploads/986dfdea.png\" 
style=\"height:163px; width:650px\" /></p></p> 
<p>end of string</p>

I wish to get image name 986dfdea.png inside the html tag (because there's a lot of <p></p> tags inside the string, and I want to able to know that this tag contains image), and replace the whole tag content by a symbol, like '#image1'.

Eventually it would become this:

<p>Commodity Exchange on 5 April 2016 settled as following graph:</p> 
#image1 
<p>end of string</p>

I'm developing API for mobile apps, but having baby skill on PHP, still can't achieve my goal by referring to these references:

PHP/regex: How to get the string value of HTML tag?

How to extract img src, title and alt from html using php?

Please help.

Yes, you could use a regex and you'd need way less code, but we shouldn't parse html with a regex , so here's what you need:

  1. Your string contains invalid html ( </p></p> ), so we use tidy_repair_string to clean it.
  2. Use DOMXpath() to query for p tags with img tags inside
  3. Remove any extra " and get the image filename with getAttribute("src") and basename
  4. Create a new createTextNode with the value of image #imagename
  5. Use replaceChild to replace the p with image inside with new createTextNode created above.
  6. Cleanup the !DOCTYPE , html and body tags automatically generated by new DOMDocument();

<?php
$html = <<< EOF
<p>Commodity Exchange on 5 April 2016 settled as following graph:</p>
<p><img alt=\"\" src=\"ckeditor/plugins/imageuploader/uploads/986dfdea.png\"
style=\"height:163px; width:650px\" /></p></p>
<p>end of string</p>
EOF;



$html = tidy_repair_string($html,array(
                           'output-html'   => true,
                           'wrap'           => 80,
                           'show-body-only' => true,
                           'clean' => true,
                           'input-encoding' => 'utf8',
                           'output-encoding' => 'utf8',
                                          ));


$dom = new DOMDocument();
$dom->loadHtml($html);



$x = new DOMXpath($dom);
foreach($x->query('//p/img') as $pImg){
    //get image name
    $imgFileName = basename(str_replace('"', "", $pImg->getAttribute("src")));
    $replace = $dom->createTextNode("#$imgFileName");
    $pImg->parentNode->replaceChild($replace, $pImg);
    # loadHTML causes a !DOCTYPE tag to be added, so remove it:
    $dom->removeChild($dom->firstChild);
    # it also wraps the code in <html><body></body></html>, so remove that:
    $dom->replaceChild($dom->firstChild->firstChild, $dom->firstChild);
    echo str_replace(array("<body>", "</body>"), "", $dom->saveHTML());

}

Output:

<p>Commodity Exchange on 5 April 2016 settled as following graph:</p>
<p>#986dfdea.png</p>
<p>end of string</p>

Ideone Demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM