简体   繁体   中英

getting HTML source of the web page using c# for different browsers

I want to get the HTML source of the web page using c#, as if it was visited using different browsers like IE9, Chrome, Firefox. Is there a way to do that?

You can get the HTML source in a number of ways. My preferred method is HTML Agility Pack

HtmlDocument doc = new HtmlDocument();
doc.Load("http://domain.com/resource/page.html");
doc.Save("file.htm");

The WebClient in .NET works well too.

WebClient myWebClient = new WebClient();
myWebClient.Headers.Add ("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)"); // If you need to simulate a specific browser
byte[] myDataBuffer = myWebClient.DownloadData (remoteUri);
string download = Encoding.ASCII.GetString(myDataBuffer);
// This is verbatim from MSDN... unfortunately their example does not dispose
// of myWebClient (it implements IDisposable).  You should wrap use of a WebClient
// in a using statement.

http://msdn.microsoft.com/en-us/library/xz398a3f.aspx

The HTML you get is what you get. A given browser decides what to make of it (unless, that is, the server renders different HTML for different user agents).

If you do need to explicitly set the user agent (to simulate different browsers), the following post shows how to do that:

http://blog.abodit.com/2010/03/a-simple-web-crawler-in-c-using-htmlagilitypack/

(this link also implements a simple web crawler using HTML Agility Pack)

I'm no C# expert, but assuming the html will be the same regardless of which "browser" visits the url, you can use System.Net.WebClient (if you only need simple control) or HttpWebRequest (if you need more advanced control)

For WebClient, just create an instance and call one of it's Download* methods:

var cli = new WebClient();
string data = cli.DownloadString("http://www.stackoverflow.com");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM