简体   繁体   中英

PHP DomDocument get anchor tag href and inner html?

The below code get's all the anchor tags and inner text on a page. What if the inner html is an image?
For example <a href="test.html"><img src="test.png"/></a> How do i get the src of image?

            $url = $_POST['url'];
            $html = file_get_contents($url);
            $dom = new DOMDocument;
            @$dom->loadHTML($html);

            //Get all links. 
            $links = $dom->getElementsByTagName('a');

            //Iterate over the extracted links and display their URLs
            foreach ($links as $link){
                //Extract and show the "href" attribute.
                $href = $link->getAttribute('href');
                $text = $link->nodeValue;
            }

How do i solve?

The DOMXPath class is very suitable for such problems.

$html = '<a href="test.html"><img src="test.png"/></a><a href="example.com">Click me!</a>';
$dom = new DOMDocument;
$dom->loadHTML($html);

$xPath = new DOMXPath($dom);

$nodes = $xPath->query('//a/img/@src');

//test output
foreach($nodes as $node){
  echo $node->nodeName." : ".$node->textContent.'<br>';
}

Output:

src : test.png

or if the href attributes are also required use

$nodes = $xPath->query('//a/img/@src|//a/@href');

and you get

href : test.html
src : test.png
href : example.com

Another variant:

$nodes = $xPath->query('//a/img/@src|//a[not(img)]/text()');

Result:

src : test.png
#text : Click me!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM