简体   繁体   中英

/(\S)\1(\1)+/g matching all occurrences of three equal non-whitespace characters following each other

Its given: /(\S)\1(\1)+/g matches all occurrences of three equal non-whitespace characters following each other.

I don't understand why there is () around (\S) and 2nd (\1), but not around 1st (\1). Can anyone help in explaining how above regex works?

src: http://www.javascriptkit.com/javatutors/redev2.shtml

Thnx in advance.

The \S needs parentheses to capture its value, so you can refer back to the captured value with \1 . \1 means "match the same text which capturing group #1 matched".

I believe there is a problem with this regex. You said you want to match "three equal non-whitespace characters". But the + will make this match 3 or more equal, consecutive non-whitespace characters.

The g on the end means "apply this regex over the entire input string, or globally ".

The second set of parentheses is not necessary. It needlessly captures the repeated character a second time, while matching the same strings as this regex:

/(\S)\1\1+/g

Also, as @AlexD pointed out, the description should say that it matches at least three characters. If you replaced that regex with BONK in the string fooxxxxxxbar :

'fooxxxxxxbar'.replace(/(\S)\1\1+/g, 'BONK')

..you might expect the result to be fooBONKBONKbar from their description, because there are two sets of three 'x's. But in fact the result would be fooBONKbar ; the first \1 matches the second 'x', and the \1+ matches the third 'x' and any 'x's that follow it . If they wanted to match just three characters, they should have left the + off.

I noticed several other sloppy descriptions like that, plus at least one outright error: \B is equivalent to (?!\b) (a position that's not a word boundary), not [^\b] (a character that's not a backspace). For that matter, their description of word boundaries--"the position between a word and a space"--is wrong, too. A word boundary isn't defined by any particular character, like a space--in fact, it can just as well be the absence of any character that creates one. The string:

Word

...starts with a word boundary because 'W' is a word character and, being first, it's not preceded by another word character. Similarly, the 'd' is not followed by another word character, so the end of the string is also a word boundary.

Also, a regex doesn't know from words , only word characters . The definition of a word character can vary depending on the regex flavor and Unicode or locale settings, but it always includes [A-Za-z0-9_] (ASCII letters and digits plus the underscore). A word boundary is simply a position that's between one of those characters and any other character (or no other character, as I explained earlier).

If you want to learn about regexes, I suggest you forget that site and start here instead: regular-expressions.info .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM