简体   繁体   中英

php file_get_contents(). How to retrieve particular div tags from body via url

How do i retrieve a specific div tag class/id from a page url.

i use the first function to get the title. the second for the div class is giving me a problem. Here is the code i used .

    function website_title() {
   $ch = curl_init();
   $url=$_POST['urle'];
   curl_setopt($ch, CURLOPT_URL, $url);
   curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
   // some websites like Facebook need a user agent to be set.
   curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36');
   $html = curl_exec($ch);
   curl_close($ch);

   $dom  = new DOMDocument;
   @$dom->loadHTML($html);

   $title = $dom->getElementsByTagName('title')->item('0')->nodeValue;
   echo $title;
}
function website_content() {
  // $ch = curl_init();
   $url=$_POST['urle'];
  //$html = file_get_contents($url);
$html = file_get_contents(url);
libxml_use_internal_errors(true); //Prevents Warnings, remove if desired
$dom = new DOMDocument();
$dom->loadHTML($html);
}

You can use getElementsByTagName, get all divs and check for class. Better and easier way is to use some library, that will make it easier for you, ie SimpleDomParser: http://simplehtmldom.sourceforge.net/

You can use DomXPath for getting tags with specified class. For example:

$dom = new DOMDocument();
$DOM->loadHTML($html);
$finder = new DomXPath($DOM);
$myClassName = $finder->query("//*[contains(concat(' ', normalize-space(@itemprop), ' '), ' myClassName ')]");

Then you can iterate $myClassName like dom node list.

I use DOMXPath to sort out specific elements, like this:

$dom = new DOMDocument();
@$dom->loadHTML( $html );
$xpath = new DOMXPath( $dom );

To get all divs, I would use:

$divs = $xpath->query( '//div' );

To get all divs with class "className" I use:

$divs = $xpath->query( '//div[@class="className"]' );

To get contents of the first find, use it this way:

$content = $divs->item( 0 )->nodeValue;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM