How do I extract this value using PHP Dom

Question

I do have html file this is just a prt of it though...

<div id="result" >
    <div class="res_item" id="1" h="63c2c439b62a096eb3387f88465d36d0">
        <div class="res_main">
            <h2 class="res_main_top">
                <img 
                    src="/ff/gigablast.com.png" 
                    alt="favicon for gigablast.com" 
                    width=16 
                    height=16
                    />&nbsp;
                <a 
                    href="http://www.gigablast.com/" 
                    rel="nofollow"
                    >
                    Gigablast
                </a>
                <div class="res_main">
                    <h2 class="res_main_top">
                        <img 
                            src="/ff/ask.com.png" 
                            alt="favicon for ask.com" 
                            width=16 
                            height=16
                            />&nbsp;
                        <a 
                            href="http://ask.com/" rel="nofollow"
                            >
                            Ask.com - What&#039;s Your Question?
                        </a>....

I want extract only url address (for example: http://www.gigablast.com and http://ask.com/ - there are atleast 10 urls in that html) from above using PHP Dom Document..I know up to this but dont know how to move ahead??

$doc = new DomDocument;
$doc->loadHTMLFile('urllist.html');

$data = $doc->getElementById('result');

then what?? this is inside tag hence I cant use $data->getElementsByTagName() here!!

Answer 1

You can call getElementsByTagName on a DOMElement object:

$doc = new DomDocument;
$doc->loadHTMLFile('urllist.html');

$result = $doc->getElementById('result');
$anchors = $result->getElementsByTagName('a');

$urls = array();
foreach ($anchors as $a) {
    $urls[] = $a->getAttribute('href');
}

If you want to get image sources as well, that would be easy to add.

Answer 2

If you are just trying to extract the href attribute of all a tags in the document (and the <div id="result"> doesn't matter, you could use this:

$doc = new DomDocument;
$doc->loadHTMLFile('urllist.html');
$anchors = $doc->getElementsByTagName('a');
$urls = array();
foreach($anchors as $anchor) {
    $urls[] = $anchor->attributes->href;
}

// $urls is your collection of urls in the original document.

Answer 3

Using XPath to narrow down the field to a elements inside the <div class="res_main"> element:

$doc = new DomDocument();
$doc->loadHTMLFile('urllist.html');
$xpath = new DomXpath($doc);

$query = '//div[@class="res_main"]//a';
$nodes = $xpath->query($query);

$urls = array();

foreach ($nodes as $node) {
    $href = $node->getAttribute('href');
    if (!empty($href)) {
        $urls[] = $href;
    }
}

This solves the problem of picking up all the <a> elements inside of the document, since it allows you to filter only the ones you want (since you don't care about navigation links, etc)...

How do I extract this value using PHP Dom

Question

3 answers

solution1
0 2011-01-03 18:44:24

solution2
0 2011-01-03 18:49:12

solution3
0 ACCPTED 2011-01-03 18:52:45

How do I extract this value using PHP Dom

Question

3 answers

solution1 0 2011-01-03 18:44:24

solution2 0 2011-01-03 18:49:12

solution3 0 ACCPTED 2011-01-03 18:52:45

solution1
0 2011-01-03 18:44:24

solution2
0 2011-01-03 18:49:12

solution3
0 ACCPTED 2011-01-03 18:52:45