I am developing an asp.net application. I want to add a keyword linking system.
I want to make the keyword a hyper-link to another page. But, I should not link the keyword if its currently linked (to any page). For example:
it is a <a href="http://www.somesite.com">linked keyword</a> and it should be a linked keyword.
should convert to:
it is a <a href="http://www.somesite.com">linked keyword</a> and it should be a linked <a href="http://newlycreatedLink.com">keyword</a>.
As you can see, the first keyword should be left intact.
Could you help me please to solve this problem?
I've found this link in asp.net forums. But I should tune the answer to exclude currently linked keywords. I've searched everywhere but found nothing.
To check if the keywords is "outside", look ahead
(?=
if after the keyword there's an opening <tag
or the $
end [^<>]*
any amount of characters, that are NOT >
OR <
(?:<\\w|$)
where \\w
is a shorthand to word-charcters [a-zA-Z_0-9]
So the pattern could look like:
String pattern = @"(?i)\bkeyword\b(?=[^<>]*(?:<\w|$))";
String replacement = @"<a href=\"http://newlycreatedLink.com\">\0</a>";
Put the keyword into word-boundaries \\b
and used (?i)
i modifier for case insensitive.
So this would only replace keyword
that is followed by an opening-tag or the end.
UPDATE : To replace keyword
also "inside" tags, that don't end up with </a
add |<\\/[^a]
:
String pattern = @"(?i)\bkeyword\b(?=[^<>]*(?:<\w|<\/[^a]|$))";
Don't use regular expressions for sophisticated HTML parsing like this. Use a proper HTML parser instead — here's why .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.