简体   繁体   中英

How can i use curl for fetch this url

I am trying to get an a tag using crl from a website but it seems not working. It's working fine with other websites but it's not working with this website:

sbplay1.c০m

How can i make it work?

<?php
//$url="https://google.com";
$url= "https://sbplay1.com";
$ch = curl_init();
    curl_setopt($ch, CURLOPT_COOKIE, 'viewport=1040; _flashVersion=1');
    curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-type: application/json ', 'Accept: *'));   
    curl_setopt($ch,CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36');
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    @curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$html = curl_exec($ch); 
$dom = new DOMDocument;
$dom->loadHTML($html);
$node = $dom->getElementsByTagName('a')->item(3);
$ids = $node->getAttribute("href");
echo $ids;
?>

This would be because the url that you're trying to reach would generate a single page application(SPA). These applications execute javascript to render the information that you are searching for on the page. The reason as to why curl does not have this information is because it is not a browser and therefore cannot execute javascript. You can use something like Selenium to browse the page after js rendering.

A popular crawler that I've used in the past to read SPA pages in PHP is Spatie.

https://github.com/spatie/crawler

You can tell spatie to crawl all pages and render them as if using a browser.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM