简体   繁体   中英

PHP: How to scrape content of the website based on Javascript

I'm trying to get content of this website using PHP simplehtmldom library.

http://www.immigration.govt.nz/migrant/stream/work/workingholiday/czechwhs.htm "

It is not working, so i tried using CURL:

function curl_get_file_contents($URL)
{
    $c = curl_init();
    curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($c, CURLOPT_URL, $URL);
    $contents = curl_exec($c);
    curl_close($c);

    if ($contents) return $contents;
    else return FALSE;
}

But always get only respose with some JS code and content:

<noscript>Please enable JavaScript to view the page content.</noscript>

Is any possibility to solve this using PHP? I must use PHP in this case so i need to simulate JS based browser.

Many thanks for any advice.

I must use PHP in this case so i need to simulate JS based browser.

I'd recommend you two ways:

  1. Leverage v8js php plugin to deal with site's js when scraping. See here an usage example.
  2. Simulate JS based browser thru using Selenium , iMacros or webRobots.io Chrome ext. But in this case you are off the PHP scripting.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM