簡體   English   中英

我如何在phantomjs中使用simple_html_dom

[英]how do I use simple_html_dom with phantomjs

我試圖使這兩個庫相互配合,我當前的代碼如下所示:

phantomjs.js

var page = require('webpage').create();
var system = require('system');
var address = system.args[1]; 
page.open(address, function () {
    var content = page.content;
    console.log(content);
    phantom.exit();
 }); 

scraper.php

exec('phantomjs assets/phantomjs.js '.$page, $output);
$html2 = str_get_html($output);

我得到的是:

Warning: strlen() expects parameter 1 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 91

Warning: strlen() expects parameter 1 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1139

Warning: strlen() expects parameter 1 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1149

Warning: preg_match_all() expects parameter 2 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1620

Warning: strlen() expects parameter 1 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1632

Warning: preg_match_all() expects parameter 2 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1620

Warning: strlen() expects parameter 1 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1632

Warning: preg_match_all() expects parameter 2 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1620

Warning: strlen() expects parameter 1 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1632

Warning: preg_match_all() expects parameter 2 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1620

Warning: strlen() expects parameter 1 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1632

Warning: preg_match_all() expects parameter 2 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1620

Warning: strlen() expects parameter 1 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1632

Warning: preg_match_all() expects parameter 2 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1620

Warning: strlen() expects parameter 1 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1632

Warning: preg_match_all() expects parameter 2 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1620

Warning: strlen() expects parameter 1 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1632

Warning: preg_match_all() expects parameter 2 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1620

Warning: strlen() expects parameter 1 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1632

Warning: preg_match_all() expects parameter 2 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1620

Warning: strlen() expects parameter 1 to be string, array given in D:\XAMPP\htdocs\assets\php\simple_html_dom.php on line 1632

返回錯誤的simple_html_dom函數:

// remove noise from html content
// save the noise in the $this->noise array.
protected function remove_noise($pattern, $remove_tag=false)
{
    global $debugObject;
    if (is_object($debugObject)) { $debugObject->debugLogEntry(1); }

    $count = preg_match_all($pattern, $this->doc, $matches, PREG_SET_ORDER|PREG_OFFSET_CAPTURE);

    for ($i=$count-1; $i>-1; --$i)
    {
        $key = '___noise___'.sprintf('% 5d', count($this->noise)+1000);
        if (is_object($debugObject)) { $debugObject->debugLog(2, 'key is: ' . $key); }
        $idx = ($remove_tag) ? 0 : 1;
        $this->noise[$key] = $matches[$i][$idx][0];
        $this->doc = substr_replace($this->doc, $key, $matches[$i][$idx][1], strlen($matches[$i][$idx][0]));
    }

    // reset the length of content
    $this->size = strlen($this->doc);
    if ($this->size>0)
    {
        $this->char = $this->doc[0];
    }
}

當我使用var_dump($output) ,會獲得該站點的html,因此我知道該命令正在運行,但是simple_html_dom似乎不接受它!

問題是這樣的: $output是數組,但是str_get_html需要一個字符串作為參數。 因此,請確保在解析$ output之前將其轉換為字符串。

就像是:

$body = `phantomjs myscript.js`;
$doc = str_get_html($body);

您可能需要在phantomjs腳本中使用setTimeout,而不會花費很多時間來加載dom。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM