简体   繁体   中英

Javascript regEx: wrap words and spaces into tags

I've been trying to achieve this: I want to wrap words into tag and spaces (which may be multiple) in tag, assuming original text can contain html tags that should not be toched

This is   <b>very bold</b> word. 

convert to -->

<w>This</w><s> </s><w>is</w><s>   </s><b><w>very</w><s> </s><w>bold</w></b><s> </s><w>word</w>

What is the right regEx to achieve that?

You should use two replacements >>

s.replace(/([^\s<>]+)(?:(?=\s)|$)/g, '<w>$1</w>').replace(/(\s+)/g, '<s>$1</s>')

Check this demo .


EDIT :

For more complex inputs (based on your comment below), go with >>

s.replace(/([^\s<>]+)(?![^<>]*>)(?:(?=[<\s])|$)/g, '<w>$1</w>').replace(/(\s+)(?![^<>]*>)/g, '<s>$1</s>');

Check this demo .

Regular expressions are not suited for every task. If your string can contain arbitrary HTML, than it's not possible to handle all cases using regular expressions, because HTML is a context-free language and regular expressions covers only a subset of them. Now before messing around with loops and a load of code to handle this, let me suggest the following:

If you are in a browser environment or have access to a DOM library, you could put this string inside a temporary DOM element, then work on the text nodes and then read the string back.

Here's an example using a lib I wrote some month and updated now which is called Linguigi

var element = document.createElement('div');
element.innerHTML = 'This is   <b>very bold</b> word.';

var ling = new Linguigi(element);

ling.eachWord(true, function(text) {
    return '<w>' + text + '</w>';
});

ling.eachToken(/ +/g, true, function(text) {
    return '<s>' + text + '</s>';
});

alert(element.innerHTML);

Example: http://prinzhorn.github.com/Linguigi/ (hit the Stackoverflow 12758422 button)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM