[英]Webpage download
I am having some issues download the source of a webpage, I can view the webpage fine in any browser, I can also run a web spider and download the first page no problem. 我在下载网页源代码时遇到了一些问题,我可以在任何浏览器中查看该网页,也可以运行网络蜘蛛并下载第一页页面。 Whenever I run the code to grab the source of that page I always get 403 forbidden error.
每当我运行代码以获取该页面的源代码时,总是会收到403禁止错误。
As soon as the request is sent the 403 forbidden error is returned. 发送请求后,立即返回403禁止错误。 Anyone have any ideas?
有人有想法么?
string urlAddress = "http://www.brownells.com/";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
.................................
response.Close();
readStream.Close();
string uri = @"http://brownells.com";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
request.UserAgent = @"Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36";
request.Accept = @"text/html";
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
using (Stream stream = response.GetResponseStream())
using (StreamReader reader = new StreamReader(stream))
{
Console.WriteLine (reader.ReadToEnd());
}
request.AutomaticDecompression
notifies the server that we, the client, support both gzip
and Deflate
compression schemes, so there'll be some performance gain there, however it isn't needed, the server only required that you have your UserAgent
and Accept
header set. request.AutomaticDecompression
通知服务器,我们的客户端同时支持gzip
和Deflate
压缩方案,因此那里会有一些性能提升,但这不是必需的,服务器只需要设置UserAgent
和Accept
标头即可。
Remember, if you can do it in a browser, you can do it in C#, the only time you'll seriously struggle is if there's some JavaScript sorcery where the site is setting cookies using JavaScript, it's rare, but it happens. 请记住,如果您可以在浏览器中进行操作,也可以在C#中进行操作,那么您唯一会遇到的困难是,如果存在某些JavaScript魔术网站会使用JavaScript设置Cookie,这种情况很少见,但确实如此。
Back to the topic at hand... 回到手头的话题...
If you want to dump to a file, you need to use a filestream 如果要转储到文件,则需要使用文件流
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
using (Stream stream = response.GetResponseStream())
using (StreamReader reader = new StreamReader(stream))
using (TextWriter writer = new StreamWriter("filePath.html")
{
writer.Write(reader.ReadToEnd();
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.