简体   繁体   English

使用PHP cURL获取页面内容

[英]Get content of page using PHP cURL

I want to use PHP's cURL to visit a page on an external site, and get some the whole html content of the page. 我想使用PHP的cURL来访问外部站点上的页面,并获取页面的整个html内容。

When i visit the site, it will redirect me to another page on the same site. 当我访问该站点时,它将重定向到同一站点上的另一个页面。 Also, i have to set the useragent, i want a useragent for PC windows7 chrome and iPhone 4s. 另外,我必须设置useragent,我想要PC Windows7 chrome和iPhone 4s的useragent。 This is what i got so far: 这是我到目前为止所得到的:

$ch = curl_init ($url);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_AUTOREFERER , true)
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
$kl = curl_exec ($ch);
curl_close($ch);
echo $kl;

Notice: 注意:
I will probably run into more errors. 我可能会遇到更多错误。

You might also need to consider urls with https 您可能还需要考虑使用https的网址

$cookie = tmpfile();
$userAgent = 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31' ;

$ch = curl_init($url);

$options = array(
    CURLOPT_CONNECTTIMEOUT => 20 , 
    CURLOPT_USERAGENT => $userAgent,
    CURLOPT_AUTOREFERER => true,
    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_COOKIEFILE => $cookie,
    CURLOPT_COOKIEJAR => $cookie ,
    CURLOPT_SSL_VERIFYPEER => 0 ,
    CURLOPT_SSL_VERIFYHOST => 0
);

curl_setopt_array($ch, $options);
$kl = curl_exec($ch);
curl_close($ch);
echo $kl;

So: 所以:

  1. Search the proper UserAgent strings on the 'net. 在网上搜索正确的UserAgent字符串。
  2. Enable CURLOPT_FOLLOWLOCATION as @TroyCheng indicated 启用CURLOPT_FOLLOWLOCATION为@TroyCheng
  3. Enable the CURLOPT_COOKIEFILE & CURLOPT_COOKIEJAR . 启用CURLOPT_COOKIEFILECURLOPT_COOKIEJAR

Why don't you use a library like Buzz ? 为什么不使用像Buzz这样的库?

$request = new Buzz\Message\Request('GET', '/', 'http://google.com');
$response = new Buzz\Message\Response();

$client = new Buzz\Client\Curl();
// do not check https validity
$client->setVerifyPeer(false);
// define your user agent
$client->setOption('CURLOPT_USERAGENT', $userAgent);
$client->setOption('CURLOPT_COOKIEFILE', true);
$client->setOption('CURLOPT_COOKIEJAR', true);
$client->send($request, $response);

if ($response->isOk())
{
  echo $response->getContent();

  // or if you want the dom
  echo $response->toDomDocument();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM