简体   繁体   English

网页下载

[英]Webpage download

I am having some issues download the source of a webpage, I can view the webpage fine in any browser, I can also run a web spider and download the first page no problem. 我在下载网页源代码时遇到了一些问题,我可以在任何浏览器中查看该网页,也可以运行网络蜘蛛并下载第一页页面。 Whenever I run the code to grab the source of that page I always get 403 forbidden error. 每当我运行代码以获取该页面的源代码时,总是会收到403禁止错误。

As soon as the request is sent the 403 forbidden error is returned. 发送请求后,立即返回403禁止错误。 Anyone have any ideas? 有人有想法么?

string urlAddress = "http://www.brownells.com/";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();

if (response.StatusCode == HttpStatusCode.OK)
{
      Stream receiveStream = response.GetResponseStream();
      StreamReader readStream = null;

.................................

      response.Close();
      readStream.Close();

If you're in a rush... 如果你着急...

string uri =  @"http://brownells.com";

HttpWebRequest request         = (HttpWebRequest)WebRequest.Create(uri);
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
request.UserAgent              = @"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36";
request.Accept                 = @"text/html";

using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
using (Stream stream            = response.GetResponseStream())
using (StreamReader reader      = new StreamReader(stream))
{
    Console.WriteLine (reader.ReadToEnd());
}

request.AutomaticDecompression notifies the server that we, the client, support both gzip and Deflate compression schemes, so there'll be some performance gain there, however it isn't needed, the server only required that you have your UserAgent and Accept header set. request.AutomaticDecompression通知服务器,我们的客户端同时支持gzipDeflate压缩方案,因此那里会有一些性能提升,但这不是必需的,服务器只需要设置UserAgentAccept标头即可。


The tools for the job... 工作工具...

Remember, if you can do it in a browser, you can do it in C#, the only time you'll seriously struggle is if there's some JavaScript sorcery where the site is setting cookies using JavaScript, it's rare, but it happens. 请记住,如果您可以在浏览器中进行操作,也可以在C#中进行操作,那么您唯一会遇到的困难是,如果存在某些JavaScript魔术网站会使用JavaScript设置Cookie,这种情况很少见,但确实如此。

Back to the topic at hand... 回到手头的话题...

  1. Download Fiddler , it's a web debugging proxy that's simply invaluable when debugging HTTP traffic. 下载Fiddler ,它是一个Web调试代理,在调试HTTP流量时非常有价值。 Install it and run it. 安装并运行它。
  2. Navigate to your website of choice. 导航到您选择的网站。
  3. Check out fiddler to see the request your browser sent then check out what the server responded with... 查看小提琴手以查看浏览器发送的请求,然后查看服务器响应的内容...
  4. Replicate it using C# 使用C#复制它

Link to the image below 链接到下图 在此处输入图片说明


Edit 编辑

If you want to dump to a file, you need to use a filestream 如果要转储到文件,则需要使用文件流

using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
using (Stream stream            = response.GetResponseStream())
using (StreamReader reader      = new StreamReader(stream))
using (TextWriter writer        = new StreamWriter("filePath.html") 
{
    writer.Write(reader.ReadToEnd();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM