简体   繁体   English

asp.net regex查找锚标记并替换其url

[英]asp.net regex to find anchor tags and replace their url

i'm trying to find all the anchor tags and appending the href value with a variable. 我试图找到所有的锚标记,并在href值后附加一个变量。 for example 例如

<a href="/page.aspx">link</a> will become <a href="/page.aspx?id=2">
<A hRef='http://www.google.com'><img src='pic.jpg'></a> will become <A hRef='http://www.google.com?id=2'><img src='pic.jpg'></a>

I'm able to match all the anchor tags and href values using regex, then i manually replace the values using string.replace, however i dont think its the efficient way to do this. 我能够使用regex匹配所有锚标记和href值,然后使用string.replace手动替换值,但是我认为这不是有效的方法。 Is there a solution where i can use something like regex.replace(html,newurlvalue) 有没有一种解决方案,我可以使用regex.replace(html,newurlvalue)之类的东西

Yes you can. 是的你可以。 The standard warning applies -- regular expressions are not sufficiently powerful to reliably parse html. 适用标准警告-正则表达式的功能不足以可靠地解析html。 In other words, it may actually work for you in the most straightforward & controlled examples, but there are many cases where this will fail. 换句话说,在最直接和可控制的示例中,它实际上可能对您有用,但是在很多情况下,这将失败。

However, if you already have the regular expression written then paste it into Regex Hero along with your HTML, click the "Replace" tab and type in your replacement string. 但是,如果您已经编写了正则表达式,则将其与HTML一起粘贴到Regex Hero中 ,单击“替换”选项卡,然后输入替换字符串。

Once you've verified that it's working click Tools > Generate .NET Code and you'll have your answer. 确认其正常工作后,单击工具>生成.NET代码,您将得到答案。

UPDATE: So here's an imperfect example of this in action using groups: 更新:因此,这是使用组的不完美示例:

string strRegex = @"(?<=href="")(?<url>[^""]+)(?="")";
RegexOptions myRegexOptions = RegexOptions.IgnoreCase;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = @"<a href=""/page.aspx"">link</a> will become <a href=""/page.aspx?id=2"">" + (char)10 + "<A hRef='http://www.google.com'><img src='pic.jpg'></a> will become <A hRef='http://www.google.com?id=2'><img src='pic.jpg'></a>";
string strReplace = "http://mysite.com${url}";

return myRegex.Replace(strTargetString, strReplace);

http://regexhero.net/tester/?id=e993fbf1-acb7-4f59-af87-94253a6e8221 http://regexhero.net/tester/?id=e993fbf1-acb7-4f59-af87-94253a6e8221

The (?<url>[^"]+) part is a named group that can be referenced in the replacement string as ${url} . (?<url>[^"]+)部分是一个命名组,可以在替换字符串中引用为${url}

UPDATE #2: 更新#2:

So to only match the URL's without a question mark you'd do this: 因此,仅匹配没有问号的URL,您可以这样做:

(?<=href=")(?![^"]*\?)(?<url>[^"]+)(?=")

The (?![^"]*\\?) part is a negative lookahead that does the trick. (?![^"]*\\?)部分是一个否定的超前行为。

如果您要使用Regex解析HTML,则标准建议是改用HMTL Agility Pack

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM