如何从锚标记解析 URL

Question

我想检索HTML文档中<a>标签中的 URL。 这是标签：

<a href="index.php?option=com_remository&amp;Itemid=43&amp;func=fileinfo&amp;id=49"><img src="http://dziekanat.wzim.sggw.pl/components/com_remository/images/file_icons/New.gif" width="16" height="16" border="0" align="middle" alt="file_icons/New.gif"/><b>&nbsp;Plan STAC lato 2014_15</b></a>

解析后我应该得到

index.php?option=com_remository&Itemid=43&func=fileinfo&id=49

我应该使用什么正则表达式模式？

我想用正则表达式来做这件事，因为 HTML 文档本身很旧并且没有任何 ID 可供参考。 因此，我无法使用任何更复杂的工具（如Html Agility Pack ）来做到这一点。

整个文档可以在这里找到： http : //dziekanat.wzim.sggw.pl/index.php?option=com_remository&Itemid=43&func=select&id=2

Answer 1

因此，我无法使用任何更复杂的工具（如Html Agility Pack ）来做到这一点。

为什么不？ 这对我有用

var html = new Webclient().DownloadString("http://dziekanat.wzim.sggw.pl/index.php?option=com_remository&Itemid=43&func=select&id=2");
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);


var links = doc.DocumentNode.Descendants("a")
            .Select(a => a.Attributes["href"].Value)
            .ToList();

此 Xpath 返回您的链接

var link = doc.DocumentNode.SelectSingleNode("//table[@class='sectiontableentry1']//a")
            .Attributes["href"].Value;

Answer 2

干得好：

string Pattern = @"<a[^>]*?href\s*=\s*[""']?([^'"" >]+?)[ '""][^>]*?>";

如何从锚标记解析 URL

问题描述

2 个解决方案

解决方案1
2 已采纳 2015-08-28 15:30:56

解决方案2
-1 2015-08-28 15:38:49

如何从锚标记解析 URL

问题描述

2 个解决方案

解决方案1 2 已采纳 2015-08-28 15:30:56

解决方案2 -1 2015-08-28 15:38:49

解决方案1
2 已采纳 2015-08-28 15:30:56

解决方案2
-1 2015-08-28 15:38:49