简体   繁体   中英

How to get a txt content of a web page?

I've wasted 2 days to find out, that there's a known memory leak in WebBrowser control(since 2007 or so and still, they havent fixed it) so I've decided to simply ask here, how to do the thing I need.

Till now, (using WebBrowser...), I've been visiting a site, (ctrl+a), paste it to a string and that was all. I had text content of a web page in my string. Worked perfectly untill I found out that it takes 1 gb of memory after some time. Is it possible to do that through HttpWebRequest, httpwebclient or anything?

Thanks for replies, there wasn't any thread like that (or I havent found any, searching didnt really take me much coz Im really pissed off now :P)

FORGOT TO ADD: I don't want HTML code, I know it's possible to get it easily. In my case, html code is useless. I do need the text user see while opening the page with internet browser.

using (WebClient client = new WebClient())
{
    string html = client.DownloadString("http://stackoverflow.com/questions/10839877/how-to-get-a-txt-content-of-a-web-page");
}

You can use this:

string getHtml(string url) {
   HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(url);
   request.Method = "GET";
   HttpWebResponse response = (HttpWebResponse)request.GetResponse();
   StreamReader source = new StreamReader(myWebResponse.GetResponseStream());
   string pageSourceStr = string.Empty;
   pageSourceStr= source.ReadToEnd();
   response.Close();
   return pageSourceStr;
}

You still have to do some substring replacement to reduce it from html to text. It's not too bad if you just want text from a certain div.

This will download the html content from any webpage.

WebClient client = new WebClient ();
string reply = client.DownloadString ("http://www.google.com");

Why don't you use the free open source HTML scraper like Ncrawler.

It is written in c#.

ncrawler.codeplex.com

You can get examples on how to use it here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM