This is the script I am using:
<?php
include_once("simple_html_dom.php");
$html = file_get_html("http://www.amazon.com/gp/product/B000VS8CTM");
$title = $html->find('#title');
echo count($title);
?>
count($title) returns 0.
There is indeed a line in the web page
<h1 id="title" class="a-size-large a-spacing-none">Folding Helping Hand Long-Reach Pick-Up Gripper - 26" Aluminum</h1>
but the simple_html_dom script cannot find it.
I have also tried
$title = $html->find('h1[id=title]');
but count($title) still returns 0.
I run
echo $html->plaintext;
and the title is there.
I have no idea what the problem is.
Any help is appreciated!
Edit:
I notice that stackoverflow somehow change my url after I save the post.
This is the correct function call: file_get_html(" http://www.amazon.com/gp/product/B000VS8CTM ").
you can use in this way using a foreach()
loop:
include_once("simple_html_dom.php");
$html = file_get_html("http://rads.stackoverflow.com/amzn/click/B000VS8CTM");
foreach($html->find('h1') as $element)
{
echo $element->plaintext;
}
try this:
<?php
$url = "http://www.amazon.com/gp/product/B000VS8CTM";
include_once("simple_html_dom.php");
$_curl = curl_init();
curl_setopt($_curl, CURLOPT_SSL_VERIFYHOST, 1);
curl_setopt($_curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($_curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($_curl, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; InfoPath.1)');
curl_setopt($_curl, CURLOPT_URL, $url);
$html = curl_exec( $_curl );
$_htmlDom = new simple_html_dom();
$_htmlDom->load( $html );
$productTitle = $_htmlDom->find('h1#title', 0)->innertext;
$str = $_htmlDom->save();
var_dump($str); //return string length: 400946, refer to Remark 1
$_htmlDom->clear();
var_dump($productTitle);
?>
Remark 1:
I tested in with follow code too, there must something different, but I did not trace the detail.
Summary Result:
Coding:
<?php
$_htmlDom = new simple_html_dom();
$_htmlDom->load_file( $url ); // or get HTML from SimpleHtmlDom
$productTitle = $_htmlDom->find('h1#title', 0)->innertext;
var_dump($productTitle); //return NULL
$str = $_htmlDom->save();
var_dump($str); //return string length: 283459
$_htmlDom->clear();
?>
This gives you the title. Try:
<?php
include_once("simple_html_dom.php");
$html = new simple_html_dom();
$html->load_file("http://rads.stackoverflow.com/amzn/click/B000VS8CTM");
$title = $html->find('h1',0);
$title = $title->find('#btAsinTitle',0);
echo $title->innertext;
?>
I just fixed my similar issue by putting this in the file
ini_set('user_agent',
'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3');
credit to this site : http://www.electrictoolbox.com/php-change-user-agent-string/
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.