简体   繁体   中英

C# .NET - HttpWebRequest - System.Net.WebException - Too many automatic redirections were attempted

I have a problem with HttpWebRequest class.
I am trying to get source code of website:
http://www.filmweb.pl/film/Igrzyska+%C5%9Bmierci%3A+Kosog%C5%82os.+Cz%C4%99%C5%9B%C4%87+1-2014-626983
but I am always getting an error:

System.Net.WebException occurred
  HResult=-2146233079
  Message=Too many automatic redirections were attempted.
  Source=System
  StackTrace:
       at System.Net.HttpWebRequest.GetResponse()
       at ProjectName.ClassName.MethodName(String urlAddress)
  InnerException: 

That is my code:

Uri uri = new Uri(@"http://www.filmweb.pl/film/Igrzyska+%C5%9Bmierci%3A+Kosog%C5%82os.+Cz%C4%99%C5%9B%C4%87+1-2014-626983");
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();

I have used a Fiddler Web Debugger tool to compare Firefox request with my C# .NET request, but still have no answer.

Firefox:

GET http://www.filmweb.pl/film/Igrzyska+%C5%9Bmierci%3A+Kosog%C5%82os.+Cz%C4%99%C5%9B%C4%87+1-2014-626983 HTTP/1.1
Host: www.filmweb.pl
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:43.0) Gecko/20100101 Firefox/43.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive


HTTP/1.1 200 OK
Cache-Control: private, no-cache, no-store, max-age=0, must-revalidate, proxy-revalidate
Content-Type: text/html;charset=UTF-8
Content-Language: pl-PL
Transfer-Encoding: chunked
Date: Wed, 07 Oct 2015 13:36:31 GMT
X-Cache: HIT from blade110.non.3dart.com
X-Cache-Hits: 116
Server: Apache

C# .NET:

GET http://www.filmweb.pl/film/Igrzyska+%C5%9Bmierci:+Kosog%C5%82os.+Cz%C4%99%C5%9B%C4%87+1-2014-626983 HTTP/1.1
Host: www.filmweb.pl
Connection: Keep-Alive


HTTP/1.1 301 Moved Permanently
Cache-Control: private, no-cache, no-store, max-age=0, must-revalidate, proxy-revalidate
Content-Type: text/html;charset=UTF-8
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Content-Language: pl-PL
Location: /film/Igrzyska+%C5%9Bmierci%3A+Kosog%C5%82os.+Cz%C4%99%C5%9B%C4%87+1-2014-626983
Content-Length: 0
Accept-Ranges: bytes
Date: Wed, 07 Oct 2015 13:34:51 GMT
X-Cache: MISS from blade712.non.3dart.com
Server: Apache

I have read other posts and update my code by different things, eg.

request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.TransferEncoding = "gzip, deflate";
request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:43.0) Gecko/20100101 Firefox/43.0";

request.Referer = "http://www.filmweb.pl/film/Igrzyska+%C5%9Bmierci%3A+Kosog%C5%82os.+Cz%C4%99%C5%9B%C4%87+1-2014-626983";
request.KeepAlive = true;
request.AllowAutoRedirect = true;
request.MaximumAutomaticRedirections = 250;
request.Proxy = null;
request.UseDefaultCredentials = true;

CookieContainer cookieContainer = new CookieContainer();
request.CookieContainer = cookieContainer;

but nothing works :-/

Can anybody help me with this problem?

You need to have the initial cookies when the website load before you fetch a deep-link.

The following code works for me:

//  cookies
CookieContainer cookieContainer = new CookieContainer();

// make one call to the root of the website
// to get the cookies set
Uri uri = new Uri(@"http://www.filmweb.pl");
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.CookieContainer = cookieContainer;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();

using(var s = response.GetResponseStream())
{
   using(var sr = new StreamReader(s)) 
   {
      // linqpad
      sr.ReadToEnd().Dump(); // to check for errors
   }
}

// we have cookies now
// do the deep link fetch
uri = new Uri(@"http://www.filmweb.pl/film/Igrzyska+%C5%9Bmierci%3A+Kosog%C5%82os.+Cz%C4%99%C5%9B%C4%87+1-2014-626983");
request = (HttpWebRequest)WebRequest.Create(uri);
request.CookieContainer = cookieContainer;
response = (HttpWebResponse)request.GetResponse();

//store the result
using(var f = File.Create("C:\\temp\\pl.txt"))
{
    response.GetResponseStream().CopyTo(f);
}

Make sure that if you scrape a website that you adhere to their license and usage policies. Don't do anything that goes beyond fair use or against any copy-righted materials.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM