简体   繁体   English

如何将标签中的信息放入 C# 和 HTMLAgilityPack?

[英]How do I get the information in the tag into C# and HTMLAgilityPack?

I want to get some info without the C# HTMLAgilityPack tag.我想获得一些没有 C# HTMLAgilityPack 标签的信息。 (Example: <(a) href = "https://hashcode.co.kr"description = ""...>) I want to get the href value.) (例如:<(a) href = "https://hashcode.co.kr"description = ""...>)我想获取 href 值。)

How do I do it?我该怎么做?

The HTML agility pack has quite a lot of knowledge that other solutions will lack, for example, if you do this with a regular expression it may trip over some of the oddities of HTML. HTML 敏捷包有很多其他解决方案将缺乏的知识,例如,如果您使用正则表达式执行此操作,它可能会绊倒 HTML 的一些奇怪之处。

However, if you want to do it this way you can use the expression: href="(.*)"但是,如果你想这样做,你可以使用表达式: href="(.*)"

Notes...笔记...

  1. This won't work if you have href = "url"如果您有href = "url" ,这将不起作用
  2. This won't work if single-quotes are being used, ie href='url'如果使用单引号,这将不起作用,即href='url'
  3. This won't work for a number of other possible HTML variations, no quotes, tabs rather than spaces, missing spaces, etc这不适用于许多其他可能的 HTML 变体、没有引号、制表符而不是空格、缺少空格等

Here's a C# example:这是一个 C# 示例:

using System;
using System.Text.RegularExpressions;

class Program {
    static void Main(string[] args) {
     string pattern = @"href=""(.*)"" ";
     string input = "An extraordinary day <a href=\"https://hashcode.co.kr\" description=\"example\">dawns</a> with each new day.";
     Match m = Regex.Match(input, pattern, RegexOptions.IgnoreCase);
     if (m.Success)
         Console.WriteLine("Found '{0}' at position {1}.", m.Value, m.Index);
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM