简体   繁体   中英

Javascript, Use a regex to replace content outside of HTML tags only

I am trying to write a regular expression in JavaScript to replace strings that are outside of HTML tags, and to ignore the strings within HTML tags.

Here's my JavaScript code:

var content = "Hi, my <span user="John">name</span> is &nbsp;John";
var user = 'John';
var regex = new RegExp('(&nbsp;)?' + user,'g');
content.replace(regex, function($0,$1){
    return $1 ? $0 : '<img src="images/user.png">';
});

My regex is "(&nbsp;)?John" .

The pattern works the way I want to, but it applies the matching to tag data, which I don't want.

So, the idea is to ignore everything between tags: < and > , and to ignore: &nbsp;John .

Can it be done?

Description

This regex will match John providing it is either at the start or end of the string and/or has white space on either side.

Regex to match John: (?:\\s|&nbsp;|^)(John)(?=\\s|\\r|\\n|$)

This regex incorporates that last regex and also matches all html tags and plain text urls. The order here is important because John will only match providing it's outside an html tag or not embeded into a URL.

Regex: https?:\\/\\/[^\\s]*|<\\/?\\w+\\b(?=\\s|>)(?:='[^']*'|="[^"]*"|=[^'"][^\\s>]*|[^>])*>|\\&nbsp;John|(John)

If you take this last regex and pass it through your function, then only John s outside the tags & urls will be replaced with a string.

Javascript Example

Working example: http://repl.it/J4T

Code

var content = "<span name=\"John\" funnytag:John>John John &nbsp;John DoeJohn JohnDoe Mr.JohnDoe http://cool.guy.john/LikesKittens</span>";
var rePattern = /https?:\/\/[^\s]*|<\/?\w+\b(?=\s|>)(?:='[^']*'|="[^"]*"|=[^'"][^\s>]*|[^>])*>|\&nbsp;John|(John)/gi;

content.replace(rePattern, function(match, capture) {
    return capture ? "<img src=\"images/user.png\">" : match;
});

Output

<span name="John" funnytag:John><img src="images/user.png"> <img src="images/user.png"> &nbsp;John Doe<img src="images/user.png"> <img src="images/user.png">Doe Mr.<img src="images/user.png">Doe http://cool.guy.john/LikesKittens</span>

If I understand correctly, you're saying that you want to replace anything matching the regex as long as it's not contained within a tag, ie John and optionally a preceding non-breaking space would be replaced with the return value of function($0,$1) unless it appears inside an HTML tag?

If so, you could add this look-behind assertion to the beginning of your regex: (?<!<[^>]*?) . That tells the regex to match the pattern if reading backwards from the match it doesn't encounter a < before it encounters a > .

This would be your code:

var regex = new RegExp('(?<!<[^>]*?)(&nbsp;)?' + user,'g');

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM