简体   繁体   English

需要帮助从C#中的HTML页面中提取标签

[英]Need help extracting label from HTML page in C#

I want to load one label's value from a remote HTML page. 我想从远程HTML页面加载一个标签的值。 I have done that by loading the whole page and than using regex. 我已经通过加载整个页面而不是使用正则表达式来完成此操作。 I found the desired result but this method is very slow I want it to quickly load only labels value not the whole web page. 我发现了所需的结果,但这种方法非常慢我希望它能够快速加载标签值而不是整个网页。 Any suggestions? 有什么建议么?

This is what I'm doing at the moment: 这就是我现在正在做的事情:

using (var client = new WebClient())
{
    string result = c          client.DownloadString("http://web.archive.org/http://profiles.yahoo.com/italy_");
    var regex = new Regex(@"\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*",
                          RegexOptions.Compiled);
    var s = result;
    foreach (Match email in regex.Matches(s))
    {
        // Console.WriteLine(email.Value);
        label2.Text = email.Value;
    }
}

You must load the whole page - that's the way http requests generally work. 您必须加载整个页面 - 这就是http请求通常的工作方式。

Maybe your regex could be improved? 也许你的正则表达式可以改进? Not my area of expertise though, sorry. 不过我的专业领域,对不起。

I found the desired result but this method is very slow I want it to quickly load only labels value not the whole web page. 我发现了所需的结果,但这种方法非常慢我希望它能够快速加载标签值而不是整个网页。

Couple of thoughts: 几个想法:

  • Archive.org is usually very slow in my experience. Archive.org在我的经历中通常很慢。 My guess is that's your bottleneck. 我的猜测是你的瓶颈。

  • No, there is not a way to only make a partial request to a third-party page unless they have a response mechanism capable of returning more specific data (for example, a JSON-enabled web service that returns little snippets of HTML used on the page). 不,没有办法只向第三方页面发出部分请求,除非他们有一个能够返回更多特定数据的响应机制(例如,一个支持JSON的Web服务,它返回的小部分HTML用于页)。

  • You will usually have better luck with parsing by loading data into some kind of HTML parser rather than using a regex. 通过将数据加载到某种HTML解析器而不是使用正则表达式,通常可以更好地解析。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM