简体   繁体   English

获取定位标记HREF和VALUE

[英]Get anchor tag HREF and VALUE

I have a string that looks like this: 我有一个看起来像这样的字符串:

<a href="http://forum.tibia.com/forum/?action=board&boardid=476">Amera</a><br><font class="ff_info">This board is for general discussions related to the game world Amera.</font>

How can I ignore/remove everything after the </a> and then only get the url: http://forum.tibia.com/forum/?action=board&boardid=476 and the value Amera 我如何才能忽略/删除</a>之后的所有内容,然后仅获得以下网址: http://forum.tibia.com/forum/?action=board&boardid=476 : http://forum.tibia.com/forum/?action=board&boardid=476 Amera和值Amera

So afterwards, I want 2 variables with their values, like: 所以之后,我想要2个变量及其值,例如:

string url = "http://forum.tibia.com/forum/?action=board&boardid=476";

and

string value = "Amera";

I tried this to get the value: 我试图这样做来获得价值:

string value = System.Text.RegularExpressions.Regex.Replace(MYSTRING, "(<[a|A][^>]*>|)", "");

But it returns: 但它返回:

Amera</a><br><font class="ff_info">This board is for general discussions related to the game world Amera.</font>

For getting the URL, maybe try, this regex pattern: /href=\\"(.*)\\"/ 要获取URL,请尝试以下正则表达式模式:/ /href=\\"(.*)\\"/

...And to get the values between > Amera </a> use a pattern like: >(.+?)</a> ...并获取> Amera </a>之间的值,请使用类似如下的模式: >(.+?)</a>

...although, this seems far from perfect... 尽管这似乎还不完美

If the a tag won't contain more attributes, you can use just this for the URL only: 如果a标签将不包含多种属性,你可以用这个唯一的网址:

\bhref="(.*?)"

And little more complex for URL and text: URL和文本的复杂程度略高一些:

<a\b[^>]*?\bhref="([^"]*?)"[^>]*?>(.*?)<\/a>

So in C# code (quotation marks need to be escaped!): 因此,在C#代码中(引号需要转义!):

var html = "<a href=\"http://forum.tibia.com/forum/?action=board&boardid=476\">Amera</a><br><font class=\"ff_info\">This board is for general discussions related to the game world Amera.</font>";
var match = Regex.Match(html, "<a\\b[^>]*?\\bhref=\"([^\"]*?)\"[^>]*?>(.*?)<\\/a>", RegexOptions.IgnoreCase);
if (match.Success) {
    var url = match.Groups[1];
    var text = match.Groups[2]
}

Try this: 尝试这个:

HtmlDocument dc = new HtmlAgilityPack.HtmlDocument();
        dc.LoadHtml("<a href='http://forum.tibia.com/forum/?action=board&boardid=476'>Amera</a><br><font class='ff_info'>This board is for general discussions related to the game world Amera.</font>");
        foreach (HtmlNode link in dc.DocumentNode.SelectNodes("a"))
        {
            string url = link.Attributes["href"].Value; // http://forum.tibia.com/forum/?action=board&boardid=476
            string value = link.InnerText; // Amera
        }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM