简体   繁体   中英

How to extract innerHTML using the PHP Dom

I'm currently using nodeValue to give me HTML output, however it is stripping the HTML code and just giving me plain text. Does anyone know how I can modify my code to give me the inner HTML of an element by using it's ID?

function getContent($url, $id){

// This first section gets the HTML stuff using a URL
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
$html = curl_exec($ch);
curl_close($ch);

// This second section analyses the HTML and outputs it
$newDom = new domDocument;
$newDom->loadHTML($html);
$newDom->preserveWhiteSpace = false;
$newDom->validateOnParse = true;

$sections = $newDom->getElementById($id)->nodeValue;
echo $sections;


}

This works for me:

$sections = $newDom->saveXML($newDom->getElementById($id));

http://www.php.net/manual/en/domdocument.savexml.php

If you have PHP 5.3.6, this might also be an option:

$sections = $newDom->saveHTML($newDom->getElementById($id));

http://www.php.net/manual/en/domdocument.savehtml.php

I have modify the code, and it's working fine for me. Please find below the code

    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
    $html = curl_exec($ch);
    curl_close($ch);
    $newDom = new domDocument;
    libxml_use_internal_errors(true);
    $newDom->loadHTML($html);
    libxml_use_internal_errors(false);
    $newDom->preserveWhiteSpace = false;
    $newDom->validateOnParse = true;

    $sections = $newDom->saveHTML($newDom->getElementById('colophon'));   
    echo $sections;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM