正则表达式替换html标记之外的匹配关键字和锚点（a）标记文本

Question

I am developing an asp.net application. 我正在开发一个asp.net应用程序。 I want to add a keyword linking system. 我想添加一个关键字链接系统。

I want to make the keyword a hyper-link to another page. 我想让关键字成为另一个页面的超链接。 But, I should not link the keyword if its currently linked (to any page). 但是，我不应该链接关键字，如果它当前链接（到任何页面）。 For example: 例如：

it is a <a href="http://www.somesite.com">linked keyword</a> and it should be a linked keyword.

should convert to: 应转换为：

it is a <a href="http://www.somesite.com">linked keyword</a> and it should be a linked <a href="http://newlycreatedLink.com">keyword</a>.

As you can see, the first keyword should be left intact. 如您所见，第一个关键字应保持不变。

Could you help me please to solve this problem? 你能帮帮我解决这个问题吗？

I've found this link in asp.net forums. 我在asp.net论坛中找到了这个链接。 But I should tune the answer to exclude currently linked keywords. 但我应该调整答案以排除当前链接的关键字。 I've searched everywhere but found nothing. 我到处搜索但一无所获。

Answer 1

To check if the keywords is "outside", look ahead 要检查关键字是否在“外部”，请向前看

(?= if after the keyword there's an opening <tag or the $ end (?=如果在关键字之后有一个开头<tag或$ end
[^<>]* any amount of characters, that are NOT > OR < [^<>]*任何数量的字符，不是> OR <
followed by (?:<\\w|$) where \\w is a shorthand to word-charcters [a-zA-Z_0-9] 接着是(?:<\\w|$)其中\\w是word-charcters的缩写[a-zA-Z_0-9]

So the pattern could look like: 所以模式看起来像：

String pattern = @"(?i)\bkeyword\b(?=[^<>]*(?:<\w|$))";

String replacement = @"<a href=\"http://newlycreatedLink.com\">\0</a>";

Put the keyword into word-boundaries \\b and used (?i) i modifier for case insensitive. 将关键字放入字边界\\b并使用(?i) i修饰符不区分大小写。

So this would only replace keyword that is followed by an opening-tag or the end. 所以这只会替换一个开头标记或结尾的keyword 。

UPDATE : To replace keyword also "inside" tags, that don't end up with </a add |<\\/[^a] : 更新：要替换keyword也是“内部”标签，不会以</a add |<\\/[^a] ：

String pattern = @"(?i)\bkeyword\b(?=[^<>]*(?:<\w|<\/[^a]|$))";

Answer 2

Don't use regular expressions for sophisticated HTML parsing like this. 不要像这样使用正则表达式进行复杂的HTML解析。 Use a proper HTML parser instead — here's why . 使用正确的HTML解析器 - 这就是原因。

正则表达式替换html标记之外的匹配关键字和锚点（a）标记文本

问题描述

2 个解决方案

解决方案1
2 已采纳 2014-01-25 11:00:06

解决方案2
1 2014-01-25 10:49:13

正则表达式替换html标记之外的匹配关键字和锚点（a）标记文本

问题描述

2 个解决方案

解决方案1 2 已采纳 2014-01-25 11:00:06

解决方案2 1 2014-01-25 10:49:13

解决方案1
2 已采纳 2014-01-25 11:00:06

解决方案2
1 2014-01-25 10:49:13