简体   繁体   中英

Need help extracting label from HTML page in C#

I want to load one label's value from a remote HTML page. I have done that by loading the whole page and than using regex. I found the desired result but this method is very slow I want it to quickly load only labels value not the whole web page. Any suggestions?

This is what I'm doing at the moment:

using (var client = new WebClient())
{
    string result = c          client.DownloadString("http://web.archive.org/http://profiles.yahoo.com/italy_");
    var regex = new Regex(@"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*",
                          RegexOptions.Compiled);
    var s = result;
    foreach (Match email in regex.Matches(s))
    {
        // Console.WriteLine(email.Value);
        label2.Text = email.Value;
    }
}

You must load the whole page - that's the way http requests generally work.

Maybe your regex could be improved? Not my area of expertise though, sorry.

I found the desired result but this method is very slow I want it to quickly load only labels value not the whole web page.

Couple of thoughts:

  • Archive.org is usually very slow in my experience. My guess is that's your bottleneck.

  • No, there is not a way to only make a partial request to a third-party page unless they have a response mechanism capable of returning more specific data (for example, a JSON-enabled web service that returns little snippets of HTML used on the page).

  • You will usually have better luck with parsing by loading data into some kind of HTML parser rather than using a regex.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM