how do I use C# to dump a web page's HTML to a text file?

Question

I am working on a project where I need to be able to take a website url www.google.com for example and get the html for it in a text file to be parsed separately, but I don't know how to do so.

I know there is an easier way to do this than the way I'm doing it, but this is a project aimed at use and knowledge increase.

Answer 1

Downloading just a single URL to a file is dead easy using WebClient :

using (var client = new WebClient())
{
    client.DownloadFile(url, filename);
}

The trickier bit is that very few web pages really consist of a single piece of HTML - most then load Javascript, or load more data with Javascript, etc.

In .NET 4.5 and later you might want to use HttpClient instead of WebClient - although it's asynchronous and (as far as I can see) doesn't provide anything quite as convenient as DownloadFile when that's all you want to do.

Answer 2

You can try HtmlAgilityPack:

string Url = "http://something";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(Url);
string contents= doc.DocumentNode.OuterHtml;
File.WriteAllText("X:\abc\def.txt", contents);

Answer 3

C# WebClient class can help you to achieve this:

using System;
using System.Net;
using System.IO;

    using (WebClient client = new WebClient())
    {
        string htmlCode = client.DownloadString("http://somesite.com/default.html");
        File.WriteAllText(@"c:\YourLocalFolder\somefile.txt", htmlCode);
    }

Answer 4

如果还有其他所有文件，则可以使用WebBrowser控件在应用程序中运行IE，这可以运行页面上的jscript等。然后，您可以从C＃访问DOM。

how do I use C# to dump a web page's HTML to a text file?

Question

4 answers

solution1
2 2015-02-03 07:22:13

solution2
0 2015-02-03 07:09:18

solution3
0 2015-02-03 07:29:07

solution4
0 2015-02-03 15:11:18

how do I use C# to dump a web page's HTML to a text file?

Question

4 answers

solution1 2 2015-02-03 07:22:13

solution2 0 2015-02-03 07:09:18

solution3 0 2015-02-03 07:29:07

solution4 0 2015-02-03 15:11:18

solution1
2 2015-02-03 07:22:13

solution2
0 2015-02-03 07:09:18

solution3
0 2015-02-03 07:29:07

solution4
0 2015-02-03 15:11:18