如何使用PHP Simple HTML DOM Parser抓取動態數據

Question

首先，我要說的是，我在這里閱讀了許多“報廢”線程，但對我沒有幫助。 我也檢查了幾天的互聯網，現在我已經接近電線，希望有人可以幫我一下。

我正在使用PHP Simple HTML DOM解析器從頁面中抓取一些數據。 我正在使用的url提供動態內容，但似乎無法進行任何操作來提取該內容。我需要從<tr id="0" class="ui-widget-content jqgrow ui-row-ltr" role="row">到<tr id="9" class="ui-widget-content jqgrow ui-row-ltr" role="row"> ，我覺得一旦獲得工作，我可以得到其他人。 因為在加載頁面時此信息實際上不在頁面上，而是在頁面加載后進入折疊狀態，所以我很不高興。

話雖如此，這是我嘗試過的：

echo file_get_html('http://sheriffclevelandcounty.com/p2c/jailinmates.aspx')->plaintext;

上面的內容將向我展示所有需要的信息，例如：

我還嘗試了使用IMDb插件中的示例，並根據需要進行了修改，就是這樣：

// Defining the basic cURL function
    function curl($url) {
        // Assigning cURL options to an array
        $options = Array(
            CURLOPT_RETURNTRANSFER => TRUE,  // Setting cURL's option to return the webpage data
            CURLOPT_FOLLOWLOCATION => TRUE,  // Setting cURL to follow 'location' HTTP headers
            CURLOPT_AUTOREFERER => TRUE, // Automatically set the referer where following 'location' HTTP headers
            CURLOPT_CONNECTTIMEOUT => 120,   // Setting the amount of time (in seconds) before the request times out
            CURLOPT_TIMEOUT => 120,  // Setting the maximum amount of time for cURL to execute queries
            CURLOPT_MAXREDIRS => 10, // Setting the maximum number of redirections to follow
            CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8",  // Setting the useragent
            CURLOPT_URL => $url, // Setting cURL's URL option with the $url variable passed into the function
        );

        $ch = curl_init();  // Initialising cURL
        curl_setopt_array($ch, $options);   // Setting cURL's options using the previously assigned array data in $options
        $data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
        curl_close($ch);    // Closing cURL
        return $data;   // Returning the data from the function
    }

     // Defining the basic scraping function
    function scrape_between($data, $start, $end){
        $data = stristr($data, $start); // Stripping all data from before $start
        $data = substr($data, strlen($start));  // Stripping $start
        $stop = stripos($data, $end);   // Getting the position of the $end of the data to scrape
        $data = substr($data, 0, $stop);    // Stripping all data from after and including the $end of the data to scrape
        return $data;   // Returning the scraped data from the function
    }

    $scraped_page = curl("http://sheriffclevelandcounty.com/p2c/jailinmates.aspx");    // Downloading IMDB home page to variable $scraped_page
    $scraped_data = scrape_between($scraped_page, '<table id="tblII" class="ui-jqgrid-btable" cellspacing="0" cellpadding="0" border="0" role="grid" aria-multiselectable="false" aria-labelledby="gbox_tblII" style="width: 456px;">', '</table>');   // Scraping downloaded dara in $scraped_page for content between <title> and </title> tags

    echo $scraped_data; // Echoing $scraped data, should show "The Internet Movie Database (IMDb)"

當然，這些都不起作用，所以我的問題是：如何使用PHP Simple DOM分析器獲取頁面加載后加載的動態內容？ 有可能還是我完全走錯了路？

Answer 1

我了解您需要jqgrid中提供的動態數據。 為此，您可以使用發布網址，以提供數據。

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://sheriffclevelandcounty.com/p2c/jqHandler.ashx?op=s");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_POST, 1);
curl_setopt($ch,CURLOPT_POSTFIELDS, array(
'rows'=>10000, //Here you can specify how many records you want
't'=>'ii'
    ));
$output = curl_exec($ch);
curl_close($ch);
echo "<pre>";
print_r(json_decode($output));

如何使用PHP Simple HTML DOM Parser抓取動態數據

問題描述

1 個解決方案

解決方案1
0 2014-07-20 03:56:26

如何使用PHP Simple HTML DOM Parser抓取動態數據

問題描述

1 個解決方案

解決方案1 0 2014-07-20 03:56:26

解決方案1
0 2014-07-20 03:56:26