简体   繁体   中英

PHP regex to check if image is wrapped with a tag

I am creating a wordpress function and need to determine whether an image in the content is wrapped with an a tag that contains a link to a PDF or DOC file eg

<a href="www.site.com/document.pdf"><img src="../images/image.jpg" /></a>

How would I go about doing this with PHP?

Thanks

I would very strongly advise against using a regular expression for this. Besides being more error prone and less readable, it also does not give you the ability to manipulate the content easily.

You would be better of loading the content into a DomDocument, retrieving all <img> elements and validating whether or not their parents are <a> elements. All you would have to do then is validate whether or not the value of the href attribute ends with the desired extension.

A very crude implementation would look a bit like this :

<?php

$sHtml = <<<HTML
<html>
<body>
    <img src="../images/image.jpg" />
    <a href="www.site.com/document.pdf"><img src="../images/image.jpg" /></a>
    <a href="www.site.com/document.txt"><img src="../images/image.jpg" /></a>
    <p>this is some text <a href="site.com/doc.pdf"> more text</p> 
</body>
</html>
HTML;

$oDoc = new DOMDocument();
$oDoc->loadHTML($sHtml);
$oNodeList = $oDoc->getElementsByTagName('img');

foreach($oNodeList as $t_oNode)
{
    if($t_oNode->parentNode->nodeName === 'a')
    {
        $sLinkValue = $t_oNode->parentNode->getAttribute('href');
        $sExtension = substr($sLinkValue, strrpos($sLinkValue, '.'));

        echo '<li>I am wrapped in an anchor tag '
           . 'and I link to  a ' . $sExtension . ' file '
        ; 
    }
}
?>

I'll leave an exact implementation as an exercise for the reader ;-)

Here is a DOM parse based code that you can use:

$html = <<< EOF
<a href="www.site.com/document.pdf"><img src="../images/image.jpg" /></a>
<img src="../images/image1.jpg" />
<a href="www.site.com/document.txt"><IMG src="../images/image2.jpg" /></a>
<a href="www.site.com/document.doc"><img src="../images/image3.jpg" /></a>
<a href="www.site.com/document1.pdf">My PDF</a>
EOF;
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$nodeList = $doc->getElementsByTagName('a');
for($i=0; $i < $nodeList->length; $i++) {
    $node = $nodeList->item($i);
    $children = $node->childNodes; 
    $hasImage = false;
    foreach ($children as $child) { 
       if ($child->nodeName == 'img') {
          $hasImage = true;
          break;
       }
    }
    if (!$hasImage)
       continue;
    if ($node->hasAttributes())
       foreach ($node->attributes as $attr) {
          $name = $attr->nodeName;
          $value = $attr->nodeValue;
          if ($attr->nodeName == 'href' && 
              preg_match('/\.(doc|pdf)$/i', $attr->nodeValue)) {
                echo $attr->nodeValue . 
                     " - Image is wrapped in a link to a PDF or DOC file\n";
                break;
          }

       }
}

Live Demo: http://ideone.com/dwJNAj

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM