简体   繁体   中英

PHP curl, preserve session

I'm making an app that scrapes data off a site, formats it as needed, and displays it to the user. Now, the site doesn't allow cross-site script requests, so I'm using PHP's curl to retrieve the page.

  • With a browser, the site gives you a cookie when you first visit, asking you to log in, and on subsequent requests will give you the actual page you requested.

  • With PHP's curl, the site will just give me the page asking me to log in. And, I presume, give my PHP server a cookie.

How can I save this cookie and present it on subsequent requests?

use a few setopts to set the cookie.

Example:

$ch=curl_init();
curl_setopt($ch, CURLOPT_COOKIEFILE, "c:/cookies/cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "-");

I modified nabab's code, tried it and it worked perfectly as I wanted:

    $loginData = array('username'=>'myuser', 'password'=>'mypassword');
$postData = array('url'=>'http://stackoverflow.com');
$loginURL = "http://stackoverflow.com/login.php";
 $addURL = "http://stackoverflow.com/addUrl.php";

$curl_options = array(
    CURLOPT_RETURNTRANSFER => true,     /* return web page */
    CURLOPT_HEADER         => false,    /* don't return headers */
    CURLOPT_FOLLOWLOCATION => true,     /* follow redirects */
    CURLOPT_ENCODING       => "",       /* handle all encodings */
    CURLOPT_AUTOREFERER    => true,     /* set referer on redirect */
    CURLOPT_CONNECTTIMEOUT => 120,      /* timeout on connect */
    CURLOPT_TIMEOUT        => 120,      /* timeout on response */
    CURLOPT_MAXREDIRS      => 10,       /* stop after 10 redirects */
    CURLOPT_SSL_VERIFYHOST => 0,
    CURLOPT_SSL_VERIFYPEER => 0
);

$cookie = "cookie.txt";
if ( $ch = curl_init() )
{
    curl_setopt_array($ch,$curl_options);
    if ( $cookie )
    {
        curl_setopt($ch,CURLOPT_COOKIEJAR,$cookie);
        curl_setopt($ch, CURLOPT_POST, true);
        curl_setopt($ch, CURLOPT_URL, $loginURL);
        curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($loginData) );
        $r = curl_exec($ch);
        curl_setopt($ch, CURLOPT_URL, $addURL);
        curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($postData) );
        $r = curl_exec($ch);

    }
     curl_close($ch);
}

You have to use the cookie. That's how I do (I return an array with html content and the encoding which can be useful for scraping):

$curl_options = array(
    CURLOPT_RETURNTRANSFER => true,     /* return web page */
    CURLOPT_HEADER         => false,    /* don't return headers */
    CURLOPT_FOLLOWLOCATION => true,     /* follow redirects */
    CURLOPT_ENCODING       => "",       /* handle all encodings */
    CURLOPT_AUTOREFERER    => true,     /* set referer on redirect */
    CURLOPT_CONNECTTIMEOUT => 120,      /* timeout on connect */
    CURLOPT_TIMEOUT        => 120,      /* timeout on response */
    CURLOPT_MAXREDIRS      => 10,       /* stop after 10 redirects */
    CURLOPT_SSL_VERIFYHOST => 0,
    CURLOPT_SSL_VERIFYPEER => 0
);
if ( $ch = curl_init($url) )
{
    curl_setopt_array($ch,self::$curl_options);
    if ( $cookie )
        curl_setopt($ch,CURLOPT_COOKIEJAR,$cookie);
        $r = curl_exec($ch);
        curl_close($ch);
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM