简体   繁体   中英

Cannot find element using simple_html_dom.php

This is the script I am using:

<?php

    include_once("simple_html_dom.php");

    $html = file_get_html("http://www.amazon.com/gp/product/B000VS8CTM");
    $title = $html->find('#title');
    echo count($title);

?>

count($title) returns 0.

There is indeed a line in the web page

<h1 id="title" class="a-size-large a-spacing-none">Folding Helping Hand Long-Reach Pick-Up Gripper - 26" Aluminum</h1>

but the simple_html_dom script cannot find it.

I have also tried

$title = $html->find('h1[id=title]');

but count($title) still returns 0.

I run

echo $html->plaintext;

and the title is there.

I have no idea what the problem is.

Any help is appreciated!


Edit:

I notice that stackoverflow somehow change my url after I save the post.

This is the correct function call: file_get_html(" http://www.amazon.com/gp/product/B000VS8CTM ").

you can use in this way using a foreach() loop:

include_once("simple_html_dom.php");

$html = file_get_html("http://rads.stackoverflow.com/amzn/click/B000VS8CTM");
foreach($html->find('h1') as $element) 
{
    echo $element->plaintext;
}

try this:

<?php
$url = "http://www.amazon.com/gp/product/B000VS8CTM";

include_once("simple_html_dom.php");

$_curl = curl_init();
curl_setopt($_curl, CURLOPT_SSL_VERIFYHOST, 1);
curl_setopt($_curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($_curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($_curl, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; InfoPath.1)');
curl_setopt($_curl, CURLOPT_URL, $url);
$html = curl_exec( $_curl );

$_htmlDom = new simple_html_dom();
$_htmlDom->load(  $html  );
$productTitle = $_htmlDom->find('h1#title', 0)->innertext;
$str = $_htmlDom->save();
var_dump($str); //return string length: 400946, refer to Remark 1
$_htmlDom->clear();

var_dump($productTitle);
?>

Remark 1:

I tested in with follow code too, there must something different, but I did not trace the detail.

Summary Result:

  • using cURL must work with CURLOPT_RETURNTRANSFER
  • using _htmlDom->load_file sometime will get something missing

Coding:

<?php
$_htmlDom = new simple_html_dom();
$_htmlDom->load_file(  $url  ); // or get HTML from SimpleHtmlDom
$productTitle = $_htmlDom->find('h1#title', 0)->innertext;
var_dump($productTitle); //return NULL
$str = $_htmlDom->save();
var_dump($str); //return string length: 283459
$_htmlDom->clear();
?>

This gives you the title. Try:

<?php
    include_once("simple_html_dom.php");

    $html = new simple_html_dom();
    $html->load_file("http://rads.stackoverflow.com/amzn/click/B000VS8CTM");
    $title = $html->find('h1',0);
    $title = $title->find('#btAsinTitle',0);
    echo $title->innertext;
?>

I just fixed my similar issue by putting this in the file

ini_set('user_agent', 
  'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3');

credit to this site : http://www.electrictoolbox.com/php-change-user-agent-string/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM