[英]tracking page headers and redirects with php-libcurl
I was writing a script to track headers especially redirects and cookies for a url. 我正在编写一个脚本来跟踪标头,特别是URL的重定向和cookie。 Many times when i open a url it redirects to another url or sometimes more than one url and also stores some cookies.
很多时候,当我打开一个URL时,它会重定向到另一个URL或多个URL,有时还会存储一些Cookie。 But when i ran the script with url
但是当我用url运行脚本时
my script didnt save cookies and it only showed one redirect and didnt store any cookies. 我的脚本没有保存cookie,并且只显示了一个重定向并且没有存储任何cookie。 but when i browsed the url in firefox it saved cookies and when i inspected it with
Live HTTP Headers
it showed multiple get requests. 但是,当我在firefox中浏览url时,它保存了cookie,当我使用
Live HTTP Headers
检查时,它显示了多个get请求。 Live HTTP Headers also shows that there are Set-Cookie headers. 实时HTTP标头还显示有Set-Cookie标头。
<?php
$url="http://en.wikipedia.org/";
$userAgent="Mozilla/5.0 (Windows NT 5.1; rv:2.0)Gecko/20100101 Firefox/4.0";
$accept="text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$encoding="gzip, deflate";
$header['lang']="en-us,en;q=0.5";
$header['charset']="ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header['conn']="keep-alive";
$header['keep-alive']=115;
$i=1;
$flag=1; //0 if there is no redirect i.e. no location header to follow. used here to to control the while loop below
while($flag!=0) {
$ch=curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_USERAGENT,$userAgent);
curl_setopt($ch,CURLOPT_ENCODING,$encoding);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,0);
curl_setopt($ch,CURLOPT_HEADER,1);
curl_setopt($ch,CURLOPT_NOBODY,1);
curl_setopt($ch,CURLOPT_AUTOREFERER,true);
curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__) . "/cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, dirname(__FILE__) . "/cookie.txt");
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
$pageHeader[$i]=curl_exec($ch);
curl_close($ch);
$flag=preg_match('/Location: (.*)\s/',$pageHeader[$i],$location[$i]);
if($flag==1) { //if there is a location header
if(preg_match('@^(http://|www.)@',$location[$i][1],$tempurl)==1) { //if it is an absolute url
$url=$location[$i][1];
} else {
if(preg_match('@^/(.*)@',$location[$i][1],$tempurl)==1) { //if the url corresponds to url relative to server's root
preg_match('@^((http://)|(www.))[^/]+@',$url,$domain);
$url=$domain.$tempurl[0];
} else { //if the url is relative to current directory
$url=preg_replace('@(/[^/]+)$@',"/".$location[$i][1],$url);
}
}
$location[$i]=$url;
preg_match('/Set-Cookie: (.*)\s/',$pageHeader[$i],$cookie[$i]);
$i++;
}
foreach($location as $l)
$loc=$loc.$l."\n";
$header=implode("\n\n\n",$pageHeader);
file_put_contents(dirname(__FILE__) . "/location.txt",$loc);
file_put_contents(dirname(__FILE__) . "/header.txt",$header);
?>
here the file location.txt
and header.txt
are created but cookie.txt
are not created. 此处创建了文件
location.txt
和header.txt
,但未创建cookie.txt
。 if i change the url to google.com then it shows the redirect to google.co.in
in the location.txt
file and it saves a cookie in the cookie.txt
file. 如果我将网址更改为google.com,则它将在
location.txt
文件中显示到google.co.in
的重定向,并将cookie保存在cookie.txt
文件中。 But when i open google.com
in Firefox
it saves three cookies. 但是,当我在
Firefox
打开google.com
,它会保存三个cookie。 What can be wrong? 有什么事吗 I think there is some javascript on the page that is setting the cookies so curl is not able to get that.
我认为页面上有一些设置cookie的javascript,所以curl无法获得它。 also any suggestions for the improvement of above code are welcome
也欢迎对上述代码进行改进的任何建议
您的位置:以下代码已完全损坏,因为您应该已经看到大多数HTTP重定向是相对的,因此您不能仅在后续请求中将该字符串用作URL。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.