I have a problem with HttpWebRequest class.
I am trying to get source code of website:
http://www.filmweb.pl/film/Igrzyska+%C5%9Bmierci%3A+Kosog%C5%82os.+Cz%C4%99%C5%9B%C4%87+1-2014-626983
but I am always getting an error:
System.Net.WebException occurred
HResult=-2146233079
Message=Too many automatic redirections were attempted.
Source=System
StackTrace:
at System.Net.HttpWebRequest.GetResponse()
at ProjectName.ClassName.MethodName(String urlAddress)
InnerException:
That is my code:
Uri uri = new Uri(@"http://www.filmweb.pl/film/Igrzyska+%C5%9Bmierci%3A+Kosog%C5%82os.+Cz%C4%99%C5%9B%C4%87+1-2014-626983");
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
I have used a Fiddler Web Debugger tool to compare Firefox request with my C# .NET request, but still have no answer.
Firefox:
GET http://www.filmweb.pl/film/Igrzyska+%C5%9Bmierci%3A+Kosog%C5%82os.+Cz%C4%99%C5%9B%C4%87+1-2014-626983 HTTP/1.1
Host: www.filmweb.pl
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:43.0) Gecko/20100101 Firefox/43.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
HTTP/1.1 200 OK
Cache-Control: private, no-cache, no-store, max-age=0, must-revalidate, proxy-revalidate
Content-Type: text/html;charset=UTF-8
Content-Language: pl-PL
Transfer-Encoding: chunked
Date: Wed, 07 Oct 2015 13:36:31 GMT
X-Cache: HIT from blade110.non.3dart.com
X-Cache-Hits: 116
Server: Apache
C# .NET:
GET http://www.filmweb.pl/film/Igrzyska+%C5%9Bmierci:+Kosog%C5%82os.+Cz%C4%99%C5%9B%C4%87+1-2014-626983 HTTP/1.1
Host: www.filmweb.pl
Connection: Keep-Alive
HTTP/1.1 301 Moved Permanently
Cache-Control: private, no-cache, no-store, max-age=0, must-revalidate, proxy-revalidate
Content-Type: text/html;charset=UTF-8
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Content-Language: pl-PL
Location: /film/Igrzyska+%C5%9Bmierci%3A+Kosog%C5%82os.+Cz%C4%99%C5%9B%C4%87+1-2014-626983
Content-Length: 0
Accept-Ranges: bytes
Date: Wed, 07 Oct 2015 13:34:51 GMT
X-Cache: MISS from blade712.non.3dart.com
Server: Apache
I have read other posts and update my code by different things, eg.
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.TransferEncoding = "gzip, deflate";
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:43.0) Gecko/20100101 Firefox/43.0";
request.Referer = "http://www.filmweb.pl/film/Igrzyska+%C5%9Bmierci%3A+Kosog%C5%82os.+Cz%C4%99%C5%9B%C4%87+1-2014-626983";
request.KeepAlive = true;
request.AllowAutoRedirect = true;
request.MaximumAutomaticRedirections = 250;
request.Proxy = null;
request.UseDefaultCredentials = true;
CookieContainer cookieContainer = new CookieContainer();
request.CookieContainer = cookieContainer;
but nothing works :-/
Can anybody help me with this problem?
You need to have the initial cookies when the website load before you fetch a deep-link.
The following code works for me:
// cookies
CookieContainer cookieContainer = new CookieContainer();
// make one call to the root of the website
// to get the cookies set
Uri uri = new Uri(@"http://www.filmweb.pl");
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.CookieContainer = cookieContainer;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
using(var s = response.GetResponseStream())
{
using(var sr = new StreamReader(s))
{
// linqpad
sr.ReadToEnd().Dump(); // to check for errors
}
}
// we have cookies now
// do the deep link fetch
uri = new Uri(@"http://www.filmweb.pl/film/Igrzyska+%C5%9Bmierci%3A+Kosog%C5%82os.+Cz%C4%99%C5%9B%C4%87+1-2014-626983");
request = (HttpWebRequest)WebRequest.Create(uri);
request.CookieContainer = cookieContainer;
response = (HttpWebResponse)request.GetResponse();
//store the result
using(var f = File.Create("C:\\temp\\pl.txt"))
{
response.GetResponseStream().CopyTo(f);
}
Make sure that if you scrape a website that you adhere to their license and usage policies. Don't do anything that goes beyond fair use or against any copy-righted materials.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.