简体   繁体   English

正则表达式替换html标记之外的匹配关键字和锚点(a)标记文本

[英]Regular expression to replace match keywords outside html tags AND anchor (a) tag text

I am developing an asp.net application. 我正在开发一个asp.net应用程序。 I want to add a keyword linking system. 我想添加一个关键字链接系统。

I want to make the keyword a hyper-link to another page. 我想让关键字成为另一个页面的超链接。 But, I should not link the keyword if its currently linked (to any page). 但是,我不应该链接关键字,如果它当前链接(到任何页面)。 For example: 例如:

it is a <a href="http://www.somesite.com">linked keyword</a> and it should be a linked keyword.

should convert to: 应转换为:

it is a <a href="http://www.somesite.com">linked keyword</a> and it should be a linked <a href="http://newlycreatedLink.com">keyword</a>.

As you can see, the first keyword should be left intact. 如您所见,第一个关键字应保持不变。

Could you help me please to solve this problem? 你能帮帮我解决这个问题吗?

I've found this link in asp.net forums. 我在asp.net论坛中找到了这个链接 But I should tune the answer to exclude currently linked keywords. 但我应该调整答案以排除当前链接的关键字。 I've searched everywhere but found nothing. 我到处搜索但一无所获。

To check if the keywords is "outside", look ahead 要检查关键字是否在“外部”, 请向前看

  • (?= if after the keyword there's an opening <tag or the $ end (?=如果在关键字之后有一个开头<tag$ end
  • [^<>]* any amount of characters, that are NOT > OR < [^<>]*任何数量的字符,不是> OR <
  • followed by (?:<\\w|$) where \\w is a shorthand to word-charcters [a-zA-Z_0-9] 接着是(?:<\\w|$)其中\\w是word-charcters的缩写[a-zA-Z_0-9]

So the pattern could look like: 所以模式看起来像:

String pattern = @"(?i)\bkeyword\b(?=[^<>]*(?:<\w|$))";

String replacement = @"<a href=\"http://newlycreatedLink.com\">\0</a>";

Put the keyword into word-boundaries \\b and used (?i) i modifier for case insensitive. 将关键字放入字边界\\b并使用(?i) i修饰符不区分大小写。

So this would only replace keyword that is followed by an opening-tag or the end. 所以这只会替换一个开头标记或结尾的keyword


UPDATE : To replace keyword also "inside" tags, that don't end up with </a add |<\\/[^a] : 更新 :要替换keyword也是“内部”标签,不会以</a add |<\\/[^a]

String pattern = @"(?i)\bkeyword\b(?=[^<>]*(?:<\w|<\/[^a]|$))";

Don't use regular expressions for sophisticated HTML parsing like this. 不要像这样使用正则表达式进行复杂的HTML解析。 Use a proper HTML parser instead — here's why . 使用正确的HTML解析器 - 这就是原因

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM