简体   繁体   English

使用simple_html_dom.php无法找到元素

[英]Cannot find element using simple_html_dom.php

This is the script I am using: 这是我正在使用的脚本:

<?php

    include_once("simple_html_dom.php");

    $html = file_get_html("http://www.amazon.com/gp/product/B000VS8CTM");
    $title = $html->find('#title');
    echo count($title);

?>

count($title) returns 0. count($ title)返回0。

There is indeed a line in the web page 网页上确实有一行

<h1 id="title" class="a-size-large a-spacing-none">Folding Helping Hand Long-Reach Pick-Up Gripper - 26" Aluminum</h1>

but the simple_html_dom script cannot find it. 但是simple_html_dom脚本找不到它。

I have also tried 我也尝试过

$title = $html->find('h1[id=title]');

but count($title) still returns 0. 但count($ title)仍返回0。

I run 我跑

echo $html->plaintext;

and the title is there. 标题在那里。

I have no idea what the problem is. 我不知道是什么问题。

Any help is appreciated! 任何帮助表示赞赏!


Edit: 编辑:

I notice that stackoverflow somehow change my url after I save the post. 我注意到在保存帖子后,stackoverflow会以某种方式更改我的网址。

This is the correct function call: file_get_html(" http://www.amazon.com/gp/product/B000VS8CTM "). 这是正确的函数调用:file_get_html(“ http://www.amazon.com/gp/product/B000VS8CTM ”)。

you can use in this way using a foreach() loop: 您可以通过foreach()循环以这种方式使用:

include_once("simple_html_dom.php");

$html = file_get_html("http://rads.stackoverflow.com/amzn/click/B000VS8CTM");
foreach($html->find('h1') as $element) 
{
    echo $element->plaintext;
}

try this: 尝试这个:

<?php
$url = "http://www.amazon.com/gp/product/B000VS8CTM";

include_once("simple_html_dom.php");

$_curl = curl_init();
curl_setopt($_curl, CURLOPT_SSL_VERIFYHOST, 1);
curl_setopt($_curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($_curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($_curl, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; InfoPath.1)');
curl_setopt($_curl, CURLOPT_URL, $url);
$html = curl_exec( $_curl );

$_htmlDom = new simple_html_dom();
$_htmlDom->load(  $html  );
$productTitle = $_htmlDom->find('h1#title', 0)->innertext;
$str = $_htmlDom->save();
var_dump($str); //return string length: 400946, refer to Remark 1
$_htmlDom->clear();

var_dump($productTitle);
?>

Remark 1: 备注1:

I tested in with follow code too, there must something different, but I did not trace the detail. 我也用跟随代码进行了测试,必须有所不同,但是我没有追踪细节。

Summary Result: 摘要结果:

  • using cURL must work with CURLOPT_RETURNTRANSFER 使用cURL必须与CURLOPT_RETURNTRANSFER一起使用
  • using _htmlDom->load_file sometime will get something missing 有时使用_htmlDom-> load_file会丢失一些内容

Coding: 编码:

<?php
$_htmlDom = new simple_html_dom();
$_htmlDom->load_file(  $url  ); // or get HTML from SimpleHtmlDom
$productTitle = $_htmlDom->find('h1#title', 0)->innertext;
var_dump($productTitle); //return NULL
$str = $_htmlDom->save();
var_dump($str); //return string length: 283459
$_htmlDom->clear();
?>

This gives you the title. 这给您标题。 Try: 尝试:

<?php
    include_once("simple_html_dom.php");

    $html = new simple_html_dom();
    $html->load_file("http://rads.stackoverflow.com/amzn/click/B000VS8CTM");
    $title = $html->find('h1',0);
    $title = $title->find('#btAsinTitle',0);
    echo $title->innertext;
?>

I just fixed my similar issue by putting this in the file 我只是通过将其放入文件中解决了类似的问题

ini_set('user_agent', 
  'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3');

credit to this site : http://www.electrictoolbox.com/php-change-user-agent-string/ 归功于此站点: http : //www.electrictoolbox.com/php-change-user-agent-string/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM