正則表達式：從超鏈接獲取url值

Question

我有一個包含html的字符串。 我想使用C＃從超鏈接獲取所有href值。
目標字符串
<a href="~/abc/cde" rel="new">Link1</a> <a href="~/abc/ghq">Link2</a>
我想獲取值“〜/ abc / cde”和“〜/ abc / ghq”

Answer 1

使用HTML Agility Pack解析HTML。 在他們的示例頁面上，他們有一個示例，用於解析一些HTML以獲取href值：

 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
 {
    HtmlAttribute att = link["href"];

    // Do stuff with attribute value
 }

Answer 2

建議不要使用正則表達式來解析HTML（請考慮注釋中的文本等）。

就是說，以下正則表達式可以解決問題，如果需要，還可以在標記中提供鏈接HTML：

Regex regex = new Regex(@"\<a\s[^\<\>]*?href=(?<quote>['""])(?<href>((?!\k<quote>).)*)\k<quote>[^\>]*\>(?<linkHtml>((?!\</a\s*\>).)*)\</a\s*\>", RegexOptions.IgnoreCase|RegexOptions.ExplicitCapture);
for (Match match = regex.Match(inputHtml); match.Success; match=match.NextMatch()) {
  Console.WriteLine(match.Groups["href"]);
}

Answer 3

這是正則表達式的代碼片段（使用IgnoreWhitespace選項）：

(?:<)(?<Tag>[^\s/>]+)       # Extract the tag name.
(?![/>])                    # Stop if /> is found
# -- Extract Attributes Key Value Pairs  --

((?:\s+)             # One to many spaces start the attribute
 (?<Key>[^=]+)       # Name/key of the attribute
 (?:=)               # Equals sign needs to be matched, but not captured.

(?([\x22\x27])              # If quotes are found
  (?:[\x22\x27])
  (?<Value>[^\x22\x27]+)    # Place the value into named Capture
  (?:[\x22\x27])
 |                          # Else no quotes
   (?<Value>[^\s/>]*)       # Place the value into named Capture
 )
)+                  # -- One to many attributes found!

這將為您提供每個標簽，您可以過濾出所需內容並定位所需的屬性。

我已經在我的博客（ C＃Regex Linq：提取具有可變類型屬性的HTML節點）中對此進行了詳細介紹。

正則表達式：從超鏈接獲取url值

問題描述

3 個解決方案

解決方案1
4 2010-04-12 16:56:06

解決方案2
2 已采納 2010-04-12 17:00:06

解決方案3
1 2010-04-12 17:18:44

正則表達式：從超鏈接獲取url值

問題描述

3 個解決方案

解決方案1 4 2010-04-12 16:56:06

解決方案2 2 已采納 2010-04-12 17:00:06

解決方案3 1 2010-04-12 17:18:44

解決方案1
4 2010-04-12 16:56:06

解決方案2
2 已采納 2010-04-12 17:00:06

解決方案3
1 2010-04-12 17:18:44