简体   繁体   English

PHP在HTML标记中找到某些字符并用字符串替换整个标记

[英]PHP find certain character in HTML tag and replace the whole tag by string

I have extracted a string value from my sql table and it is like below: 我从我的sql表中提取了一个字符串值,如下所示:

<p>Commodity Exchange on 5 April 2016 settled as following graph:</p> 
<p><img alt=\"\" src=\"ckeditor/plugins/imageuploader/uploads/986dfdea.png\" 
style=\"height:163px; width:650px\" /></p></p> 
<p>end of string</p>

I wish to get image name 986dfdea.png inside the html tag (because there's a lot of <p></p> tags inside the string, and I want to able to know that this tag contains image), and replace the whole tag content by a symbol, like '#image1'. 我希望在html标记内获取图像名称986dfdea.png (因为字符串中有很多<p></p>标记,我想知道这个标记包含图像),并替换整个标记符号内容,如'#image1'。

Eventually it would become this: 最终会变成这样:

<p>Commodity Exchange on 5 April 2016 settled as following graph:</p> 
#image1 
<p>end of string</p>

I'm developing API for mobile apps, but having baby skill on PHP, still can't achieve my goal by referring to these references: 我正在为移动应用程序开发API,但是拥有PHP的宝贝技能,仍然无法通过引用这些引用来实现我的目标:

PHP/regex: How to get the string value of HTML tag? PHP / regex:如何获取HTML标记的字符串值?

How to extract img src, title and alt from html using php? 如何使用PHP从html中提取img src,title和alt?

Please help. 请帮忙。

Yes, you could use a regex and you'd need way less code, but we shouldn't parse html with a regex , so here's what you need: 是的,你可以使用正则表达式,你需要更少的代码,但我们不应该用正则表达式解析html ,所以这就是你需要的:

  1. Your string contains invalid html ( </p></p> ), so we use tidy_repair_string to clean it. 您的字符串包含无效的html( </p></p> ),因此我们使用tidy_repair_string来清除它。
  2. Use DOMXpath() to query for p tags with img tags inside 使用DOMXpath()查询内部带有img标记的p标记
  3. Remove any extra " and get the image filename with getAttribute("src") and basename 删除任何额外的"并使用getAttribute("src")basename获取图像文件basename
  4. Create a new createTextNode with the value of image #imagename 使用image #imagename的值创建一个新的createTextNode
  5. Use replaceChild to replace the p with image inside with new createTextNode created above. 使用replaceChildp与内部图像替换为上面创建的新createTextNode
  6. Cleanup the !DOCTYPE , html and body tags automatically generated by new DOMDocument(); 清理由new DOMDocument();自动生成的!DOCTYPEhtmlbody标签new DOMDocument();

<?php
$html = <<< EOF
<p>Commodity Exchange on 5 April 2016 settled as following graph:</p>
<p><img alt=\"\" src=\"ckeditor/plugins/imageuploader/uploads/986dfdea.png\"
style=\"height:163px; width:650px\" /></p></p>
<p>end of string</p>
EOF;



$html = tidy_repair_string($html,array(
                           'output-html'   => true,
                           'wrap'           => 80,
                           'show-body-only' => true,
                           'clean' => true,
                           'input-encoding' => 'utf8',
                           'output-encoding' => 'utf8',
                                          ));


$dom = new DOMDocument();
$dom->loadHtml($html);



$x = new DOMXpath($dom);
foreach($x->query('//p/img') as $pImg){
    //get image name
    $imgFileName = basename(str_replace('"', "", $pImg->getAttribute("src")));
    $replace = $dom->createTextNode("#$imgFileName");
    $pImg->parentNode->replaceChild($replace, $pImg);
    # loadHTML causes a !DOCTYPE tag to be added, so remove it:
    $dom->removeChild($dom->firstChild);
    # it also wraps the code in <html><body></body></html>, so remove that:
    $dom->replaceChild($dom->firstChild->firstChild, $dom->firstChild);
    echo str_replace(array("<body>", "</body>"), "", $dom->saveHTML());

}

Output: 输出:

<p>Commodity Exchange on 5 April 2016 settled as following graph:</p>
<p>#986dfdea.png</p>
<p>end of string</p>

Ideone Demo Ideone演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM