简体   繁体   English

PHP正则表达式检查图像是否用标签包装

[英]PHP regex to check if image is wrapped with a tag

I am creating a wordpress function and need to determine whether an image in the content is wrapped with an a tag that contains a link to a PDF or DOC file eg 我正在创建一个wordpress函数,需要确定内容中的图像是否包含一个包含指向PDF或DOC文件的链接的标记,例如

<a href="www.site.com/document.pdf"><img src="../images/image.jpg" /></a>

How would I go about doing this with PHP? 我将如何使用PHP进行此操作?

Thanks 谢谢

I would very strongly advise against using a regular expression for this. 非常强烈建议不要使用正则表达式。 Besides being more error prone and less readable, it also does not give you the ability to manipulate the content easily. 除了更容易出错且可读性差之外,它还不能让您轻松操作内容。

You would be better of loading the content into a DomDocument, retrieving all <img> elements and validating whether or not their parents are <a> elements. 您最好将内容加载到DomDocument中,检索所有<img>元素并验证其父元素是否为<a>元素。 All you would have to do then is validate whether or not the value of the href attribute ends with the desired extension. 那么你要做的就是验证href属性的值是否以所需的扩展名结尾。

A very crude implementation would look a bit like this : 一个非常粗略的实现看起来有点像这样

<?php

$sHtml = <<<HTML
<html>
<body>
    <img src="../images/image.jpg" />
    <a href="www.site.com/document.pdf"><img src="../images/image.jpg" /></a>
    <a href="www.site.com/document.txt"><img src="../images/image.jpg" /></a>
    <p>this is some text <a href="site.com/doc.pdf"> more text</p> 
</body>
</html>
HTML;

$oDoc = new DOMDocument();
$oDoc->loadHTML($sHtml);
$oNodeList = $oDoc->getElementsByTagName('img');

foreach($oNodeList as $t_oNode)
{
    if($t_oNode->parentNode->nodeName === 'a')
    {
        $sLinkValue = $t_oNode->parentNode->getAttribute('href');
        $sExtension = substr($sLinkValue, strrpos($sLinkValue, '.'));

        echo '<li>I am wrapped in an anchor tag '
           . 'and I link to  a ' . $sExtension . ' file '
        ; 
    }
}
?>

I'll leave an exact implementation as an exercise for the reader ;-) 我会留下一个确切的实现作为读者的练习;-)

Here is a DOM parse based code that you can use: 这是一个基于DOM解析的代码,您可以使用:

$html = <<< EOF
<a href="www.site.com/document.pdf"><img src="../images/image.jpg" /></a>
<img src="../images/image1.jpg" />
<a href="www.site.com/document.txt"><IMG src="../images/image2.jpg" /></a>
<a href="www.site.com/document.doc"><img src="../images/image3.jpg" /></a>
<a href="www.site.com/document1.pdf">My PDF</a>
EOF;
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$nodeList = $doc->getElementsByTagName('a');
for($i=0; $i < $nodeList->length; $i++) {
    $node = $nodeList->item($i);
    $children = $node->childNodes; 
    $hasImage = false;
    foreach ($children as $child) { 
       if ($child->nodeName == 'img') {
          $hasImage = true;
          break;
       }
    }
    if (!$hasImage)
       continue;
    if ($node->hasAttributes())
       foreach ($node->attributes as $attr) {
          $name = $attr->nodeName;
          $value = $attr->nodeValue;
          if ($attr->nodeName == 'href' && 
              preg_match('/\.(doc|pdf)$/i', $attr->nodeValue)) {
                echo $attr->nodeValue . 
                     " - Image is wrapped in a link to a PDF or DOC file\n";
                break;
          }

       }
}

Live Demo: http://ideone.com/dwJNAj 现场演示: http//ideone.com/dwJNAj

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM