简体   繁体   中英

Word boundary regexp in JavaScript

Let's suppose I have the following string:

bla bla "some" bla bla some bla bla something

I would like to replace all occurences of 'some' bounded non-word symbols with ''. I wrote a regular expression for this purpose:

/^|[^0-9a-zа-я](some)[^0-9a-zа-я]|$/gi

How I use it:

'bla bla "some" bla bla some bla bla something'.replace(/^|[^0-9a-zа-я](some)[^0-9a-zа-я]|$/gi, '<$1>')

And its result is

<>bla bla <some> bla bla<some>bla bla something<>

But I expected

bla bla "<some>" bla bla <some> bla bla something

How could I fix this regex? As I know JavaScript's regular expressions don't support named groups.

Note: I can not use \\b because words I want to match contain cyrillic symbols and \\b in Javascript's regex engine doesn't work properly with non-latin letters.

You could use something along those lines :

yourString.replace(/(^|[^0-9a-zа-я])(some)(?![0-9a-zа-я])/gi, '$1<$2>')

Try it online.

Note that as Wiktor Stribiżew comments on another answer, your character class only matches the basic Cyrillic alphabet and would miss other Cyrillic characters. An alternative would be to stop using a negated character class and instead match characters you expect as word separators if they are easier to enumerate. In that optic ["\\s] would appear to be a good start :

yourString.replace(/(^|[\s"])(some)(?![^\s"])/gi, '$1<$2>')

Try it online.

Group and capture the opening and closing alternatives and include these captures in the replacement string too:

 var regex = /(^|[^0-9a-zа-яё])(some)([^0-9a-zа-яё]|$)/gi; var output = 'bla bla "some" bla bla some bla bla something'.replace(regex, '$1<$2>$3'); console.log(output); 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM