简体   繁体   English

如何提取 html 标签属性?

[英]How to extract the html tag attribute?

I'm trying to develop my first RSS News Aggregator.我正在尝试开发我的第一个 RSS 新闻聚合器。 I can easily extract the links, titles, publication date from the RSSItem Object.我可以轻松地从 RSSItem Object 中提取链接、标题、发布日期。 However, I'm having a hard time extracting the image from the feed Item.但是,我很难从提要项中提取图像。 Unfortunately, due to my low reputation of SO I can't upload images, so instead of helping me extract the value of a src attribute of <img> , can u please show me how to get the value of the href attr of <a> tag.不幸的是,由于我的声誉低,所以我无法上传图片,所以不要帮我提取<img>的 src 属性的值,你能告诉我如何获取<a>的 href attr 的值吗<a>标签。 Highly appreaciated!!高度评价!!

Here's the string这是字符串

<div style="text-align: center;"
    <a href="http://www.engadget.com/2011/07/10/element5s-mini-l-solarbag-brings-eco-friendly-energy-protectio/"></a>
</div>

Edit:编辑:

Maybe the whole title is wrong.也许整个标题是错误的。 Is there a way I can find the value using XPath?有没有办法可以使用 XPath 找到值?

Use HTMLAgilityPack as answered in this post:使用本文中回答的 HTMLAgilityPack:

How can I get values from Html Tags? 如何从 Html 标签中获取值?

More information:更多信息:

Html may not be well formed, hence we need another parser (other than XML one supplied in .net) that is more fault tolerant. Html 可能格式不正确,因此我们需要另一个更容错的解析器(XML 除外)。 That's where HTMLAgilityPack comes in.这就是 HTMLAgilityPack 的用武之地。

Getting started:入门:

  1. create a new console application创建一个新的控制台应用程序

  2. right-click on references / manage nuget packages (install NuGet if you don't have it).右键单击引用/管理 nuget 软件包(如果没有,请安装 NuGet)。

  3. add html agility添加 html 敏捷性

A working example:一个工作示例:

        using System;
        using System.IO;
        using System.Text;
        using HtmlAgilityPack;

        namespace ConsoleApplication4
        {
            class Program
            {
                private const string html = 
        @"<?xml version=""1.0"" encoding=""ISO-8859-1""?>
        <div class='linkProduct' id='link' anattribute='abc'/>
         <bookstore>
         <book>
           <title lang=""eng"">Harry Potter</title>
           <price>29.99</price>
         </book>
         <book>
           <title lang=""eng"">Learning XML</title>
           <price>39.95</price>
         </book>
         </bookstore>
        ";

                static void Main(string[] args)
                {
                    HtmlDocument doc = new HtmlDocument();
                    byte[] byteArray = Encoding.ASCII.GetBytes(html); MemoryStream stream = new MemoryStream(byteArray);
                    var ts = new MemoryStream(byteArray);
                    doc.Load(ts);
                    var root = doc.DocumentNode;
                    var tag = root.SelectSingleNode("/div");
                    var attrib = tag.Attributes["anattribute"];
                    Console.WriteLine(attrib.Value);
                }
            }
        }

Taking it further:更进一步:

Get good at XPaths.精通 XPaths。 Here's a good place to start.这是一个很好的起点。

http://www.w3schools.com/xpath/xpath_syntax.asp http://www.w3schools.com/xpath/xpath_syntax.asp

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM