简体   繁体   English

如何使用C#获取网页源代码

[英]How to get the webpage source code using C#

I know about the WebRequest and the WebResponse objects. 我知道WebRequest和WebResponse对象。 The problem is that I do not really want to get the source code of the webpage, I only want to check to see if the link exists or not. 问题是我真的不想获取网页的源代码,我只想查看链接是否存在。 The thing is, if I use the GetResponse method, it goes an pull the entire source code of the site. 问题是,如果我使用GetResponse方法,它会拉动网站的整个源代码。

I am creating a broken link checker with many links. 我正在创建一个包含许多链接的链接检查器。 It takes quite a while to check them all. 检查它们需要很长时间。 If there a way to to get MINIMAL information from a weblink? 如果有办法从网络链接获取MINIMAL信息? Only enough information to see if the link is valid or broken (not the entire source code). 只有足够的信息来查看链接是有效还是坏(不是整个源代码)。

An answer (BESIDES USING ASYNCHRONOUS TRANSFER) would be greatly appreciated! 一个答案(使用异步转移的BESIDES)将不胜感激!

A standard way of checking the existence of a link is to use a HEAD request, which causes the remote server to send the headers for the requested object, but not the object itself. 检查链接是否存在的标准方法是使用HEAD请求,该请求使远程服务器发送所请求对象的头,但不发送对象本身。 If you thus requested an object that is not on the server, the server gives you the normal 404 response, but if it does exist, you get a 200 response and no data after the headers. 如果您因此请求了不在服务器上的对象,则服务器会为您提供正常的404响应,但如果它存在,则会在标头之后获得200响应并且没有数据。 This way very little uninteresting data goes over the wire. 这种方式很少有无趣的数据通过电线。

 WebRequest request = HttpWebRequest.Create("http://www.foo.com/");
 request.Method = "HEAD"; // Just get the document headers, not the data.

HEAD is similar to GET , only that instead of getting the file contents, we get just the headers. HEAD类似于GET ,只是不是获取文件内容,而是获取标题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM