[英]Cannot find element using simple_html_dom.php
This is the script I am using: 这是我正在使用的脚本:
<?php
include_once("simple_html_dom.php");
$html = file_get_html("http://www.amazon.com/gp/product/B000VS8CTM");
$title = $html->find('#title');
echo count($title);
?>
count($title) returns 0. count($ title)返回0。
There is indeed a line in the web page 网页上确实有一行
<h1 id="title" class="a-size-large a-spacing-none">Folding Helping Hand Long-Reach Pick-Up Gripper - 26" Aluminum</h1>
but the simple_html_dom script cannot find it. 但是simple_html_dom脚本找不到它。
I have also tried 我也尝试过
$title = $html->find('h1[id=title]');
but count($title) still returns 0. 但count($ title)仍返回0。
I run 我跑
echo $html->plaintext;
and the title is there. 标题在那里。
I have no idea what the problem is. 我不知道是什么问题。
Any help is appreciated! 任何帮助表示赞赏!
Edit: 编辑:
I notice that stackoverflow somehow change my url after I save the post. 我注意到在保存帖子后,stackoverflow会以某种方式更改我的网址。
This is the correct function call: file_get_html(" http://www.amazon.com/gp/product/B000VS8CTM "). 这是正确的函数调用:file_get_html(“ http://www.amazon.com/gp/product/B000VS8CTM ”)。
you can use in this way using a foreach()
loop: 您可以通过foreach()
循环以这种方式使用:
include_once("simple_html_dom.php");
$html = file_get_html("http://rads.stackoverflow.com/amzn/click/B000VS8CTM");
foreach($html->find('h1') as $element)
{
echo $element->plaintext;
}
try this: 尝试这个:
<?php
$url = "http://www.amazon.com/gp/product/B000VS8CTM";
include_once("simple_html_dom.php");
$_curl = curl_init();
curl_setopt($_curl, CURLOPT_SSL_VERIFYHOST, 1);
curl_setopt($_curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($_curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($_curl, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; InfoPath.1)');
curl_setopt($_curl, CURLOPT_URL, $url);
$html = curl_exec( $_curl );
$_htmlDom = new simple_html_dom();
$_htmlDom->load( $html );
$productTitle = $_htmlDom->find('h1#title', 0)->innertext;
$str = $_htmlDom->save();
var_dump($str); //return string length: 400946, refer to Remark 1
$_htmlDom->clear();
var_dump($productTitle);
?>
Remark 1: 备注1:
I tested in with follow code too, there must something different, but I did not trace the detail. 我也用跟随代码进行了测试,必须有所不同,但是我没有追踪细节。
Summary Result: 摘要结果:
Coding: 编码:
<?php
$_htmlDom = new simple_html_dom();
$_htmlDom->load_file( $url ); // or get HTML from SimpleHtmlDom
$productTitle = $_htmlDom->find('h1#title', 0)->innertext;
var_dump($productTitle); //return NULL
$str = $_htmlDom->save();
var_dump($str); //return string length: 283459
$_htmlDom->clear();
?>
This gives you the title. 这给您标题。 Try: 尝试:
<?php
include_once("simple_html_dom.php");
$html = new simple_html_dom();
$html->load_file("http://rads.stackoverflow.com/amzn/click/B000VS8CTM");
$title = $html->find('h1',0);
$title = $title->find('#btAsinTitle',0);
echo $title->innertext;
?>
I just fixed my similar issue by putting this in the file 我只是通过将其放入文件中解决了类似的问题
ini_set('user_agent',
'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3');
credit to this site : http://www.electrictoolbox.com/php-change-user-agent-string/ 归功于此站点: http : //www.electrictoolbox.com/php-change-user-agent-string/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.