I have assigned a task to scrape data from a site which is password protected, I did it through CURL but now i want to get link inside that html returned by CURL, and go to that link and grab data from there. I passed the response of CURL
into file_get_contents()
but not working. Here is my CURL
code.
$ckfile = tempnam("/tmp", "CURLCOOKIE");
$useragent = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2';
$username = "XXXXXX";
$password = "XXXXXX";
$f = fopen('log.txt', 'w'); // file to write request header for debug purpose
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
$html = curl_exec($ch);
curl_close($ch);
preg_match('~<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="(.*?)" />~', $html, $viewstate);
preg_match('~<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="(.*?)" />~', $html, $eventValidation);
$viewstate = $viewstate[1];
$eventValidation = $eventValidation[1];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, false);
curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_STDERR, $f);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
// Collecting all POST fields
$postfields = array();
$postfields['__EVENTTARGET'] = "";
$postfields['__EVENTARGUMENT'] = "";
$postfields['__VIEWSTATE'] = $viewstate;
$postfields['__EVENTVALIDATION'] = $eventValidation;
$postfields['ctl00$LoginPopup1$Login1$UserName'] = $username;
$postfields['ctl00$LoginPopup1$Login1$Password'] = $password;
$postfields['ctl00$LoginPopup1$Login1$LoginButton'] = 'Log In';
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);
$ret = curl_exec($ch); // Get result after login page.
Here is simple html dom code
$html = file_get_contents($ret);
This is error i am getting
Warning: file_get_contents(1): failed to open stream: No such file or directory
Any other suggestion how to do it will be appreciated. thanks
If you are wanting the HTML output of the page you are sending the request to, try setting CURLOPT_RETURNTRANSFER
to true
, then $ret
should contain the HTML of the page after you have CURL'd one out.
I wouldn't use DOMDocument
to parse the response, as the HTML from the page may not be correctly formatted and DOMDocument
will complain.
If you are just looking for links you could use preg_match_all
on the HTML.
Like MajorCaiger says, you need to set CURLOPT_RETURNTRANSFER
to true, and then load that with str_get_html
:
$html = curl_exec($ch);
$doc = str_get_html($html);
Even still, I don't think you have much of a chance of success with this, those asp forms are very tricky.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.