[英]get href value from html anchor tag C#
如何使用 C# 僅從 html 錨標記獲取 href 值謝謝
string ref="<a href="http://www.google.com"></a>";
//i want get result from
//string ref like
//http://www.google.com
您可以使用 HTML 解析庫,例如Html Agility Pack 。 例如:
using System;
using HtmlAgilityPack;
class Program
{
static void Main()
{
var doc = new HtmlDocument();
doc.LoadHtml("<a href=\"http://www.google.com\"></a>");
var nodes = doc.DocumentNode.SelectNodes("a[@href]");
foreach (var node in nodes)
{
Console.WriteLine(node.Attributes["href"].Value);
}
}
}
如果您想在沒有HtmlAgilityPack 的情況下執行此操作,則可以使用正則表達式執行此操作:
string ref= @"<a href=""http://www.google.com"">test</a>";
var regex = new Regex("<a [^>]*href=(?:'(?<href>.*?)')|(?:\"(?<href>.*?)\")", RegexOptions.IgnoreCase);
var urls = regex.Matches(ref).OfType<Match>().Select(m => m.Groups["href"].Value).SingleOrDefault();
希望對你有幫助。
使用htmlagilitypack 。
var url= @"<a href="http://stackoverflow.com" ></a>";
HtmlDocument document = new HtmlDocument();
document.LoadHtml(url);
var tempValue= document.DocumentNode.SelectSingleNode("//a");
var link= tempValue.Attributes["href"].Value;
如果您需要錨鏈接以及錨文本,那么您可以使用以下函數返回包含 HTML 字符串中所有錨(URL;文本)的字符串列表。
public static List<string> ExtractLinks(string htmlString)
{
List<string> list = new List<string>();
string anchorStart = "<a";
string anchorEnd = "</a>";
string anchorText = string.Empty;
Regex regex = new Regex("(?:href)=[\"|']?(.*?)[\"|'|>]+", RegexOptions.Singleline | RegexOptions.CultureInvariant);
if (regex.IsMatch(htmlString))
{
foreach (Match match in regex.Matches(htmlString))
{
try
{
string strURL = match.Groups[1].Value; // should contain the HRF URL
int baseIndex = htmlString.IndexOf(strURL); // Get the Start Index of current URL.
// Start from baseindex and finc the fisrt instance of "<a" which should be the start of anchor
int anchorStartIndex = htmlString.LastIndexOf(anchorStart, baseIndex, StringComparison.CurrentCultureIgnoreCase);
// Find the end index of anchor
int anchorEndIndex = htmlString.IndexOf(anchorEnd, anchorStartIndex, StringComparison.CurrentCultureIgnoreCase);
// The actual anchor text would be found b/w ">" and "</a>" so need to find the index of ">"
int indexofanchorTextStart = htmlString.LastIndexOf(">", anchorEndIndex);
//find the substring b/w ">" and "</a>"
anchorText = htmlString.Substring(indexofanchorTextStart + 1, anchorEndIndex - indexofanchorTextStart - 1);
anchorText = HttpUtility.HtmlDecode(anchorText);
// get Full anchor from start to end
// string substringAheadAnchor = htmlString.Substring(anchorStartIndex, anchorEndIndex - anchorStartIndex + anchorEnd.Length + 1);
}
catch (Exception ex)
{
// Log Exception in parsing the anchor Text
}
if (!list.Contains(match.Groups[1].Value + ";" + anchorText))
{
list.Add(match.Groups[1].Value + ";" + anchorText);// Append URL and Text using semicolun as seperator.
}
}
}
return list;
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.