简体   繁体   English

/(\S)\1(\1)+/g 匹配三个相等的非空白字符的所有出现

[英]/(\S)\1(\1)+/g matching all occurrences of three equal non-whitespace characters following each other

Its given: /(\S)\1(\1)+/g matches all occurrences of three equal non-whitespace characters following each other.它给出:/(\S)\1(\1)+/g 匹配三个相等的非空白字符的所有出现。

I don't understand why there is () around (\S) and 2nd (\1), but not around 1st (\1).我不明白为什么在 (\S) 和第二个 (\1) 周围有 (),但在第一个 (\1) 周围没有。 Can anyone help in explaining how above regex works?任何人都可以帮助解释上述正则表达式的工作原理吗?

src: http://www.javascriptkit.com/javatutors/redev2.shtml来源: http://www.javascriptkit.com/javatutors/redev2.shtml

Thnx in advance.提前谢谢。

The \S needs parentheses to capture its value, so you can refer back to the captured value with \1 . \S需要括号来捕获其值,因此您可以使用\1返回捕获的值。 \1 means "match the same text which capturing group #1 matched". \1表示“匹配捕获组 #1 匹配的相同文本”。

I believe there is a problem with this regex.我相信这个正则表达式有问题。 You said you want to match "three equal non-whitespace characters".你说你想匹配“三个相等的非空白字符”。 But the + will make this match 3 or more equal, consecutive non-whitespace characters.但是+将使此匹配 3 个或更多个相等的、连续的非空白字符。

The g on the end means "apply this regex over the entire input string, or globally ".末尾的g表示“将此正则表达式应用于整个输入字符串,或全局”。

The second set of parentheses is not necessary.第二组括号不是必需的。 It needlessly captures the repeated character a second time, while matching the same strings as this regex:它不必要地第二次捕获重复的字符,同时匹配与此正则表达式相同的字符串:

/(\S)\1\1+/g

Also, as @AlexD pointed out, the description should say that it matches at least three characters.另外,正如@AlexD 指出的那样,描述应该说明它至少匹配三个字符。 If you replaced that regex with BONK in the string fooxxxxxxbar :如果您在字符串BONK中将正则表达式替换为fooxxxxxxbar

'fooxxxxxxbar'.replace(/(\S)\1\1+/g, 'BONK')

..you might expect the result to be fooBONKBONKbar from their description, because there are two sets of three 'x's. ..根据他们的描述,您可能期望结果是fooBONKBONKbar ,因为有两组三个“x”。 But in fact the result would be fooBONKbar ;但实际上结果是fooBONKbar the first \1 matches the second 'x', and the \1+ matches the third 'x' and any 'x's that follow it .第一个\1匹配第二个 'x', \1+匹配第三个 'x'和它后面的任何 'x '。 If they wanted to match just three characters, they should have left the + off.如果他们只想匹配三个字符,则应该将+关闭。

I noticed several other sloppy descriptions like that, plus at least one outright error: \B is equivalent to (?!\b) (a position that's not a word boundary), not [^\b] (a character that's not a backspace).我注意到其他几个类似的草率描述,加上至少一个彻底的错误: \B等同于(?!\b) (不是单词边界的 position),而不是[^\b] (不是退格符的字符) ). For that matter, their description of word boundaries--"the position between a word and a space"--is wrong, too.就此而言,他们对单词边界的描述——“单词和空格之间的 position”——也是错误的。 A word boundary isn't defined by any particular character, like a space--in fact, it can just as well be the absence of any character that creates one.单词边界不是由任何特定字符(如空格)定义的——事实上,它也可以是没有创建单词的任何字符。 The string:字符串:

Word

...starts with a word boundary because 'W' is a word character and, being first, it's not preceded by another word character. ...以单词边界开头,因为 'W' 是一个单词字符,并且作为第一个,它前面没有另一个单词字符。 Similarly, the 'd' is not followed by another word character, so the end of the string is also a word boundary.类似地,'d' 后面没有跟另一个单词字符,所以字符串的末尾也是一个单词边界。

Also, a regex doesn't know from words , only word characters .此外,正则表达式不知道words ,只知道 word characters The definition of a word character can vary depending on the regex flavor and Unicode or locale settings, but it always includes [A-Za-z0-9_] (ASCII letters and digits plus the underscore).单词字符的定义可能因正则表达式风格和 Unicode 或区域设置而异,但它始终包含[A-Za-z0-9_] (ASCII 字母和数字加上下划线)。 A word boundary is simply a position that's between one of those characters and any other character (or no other character, as I explained earlier).单词边界只是一个 position,位于这些字符之一和任何其他字符(或没有其他字符,如我之前解释的)之间。

If you want to learn about regexes, I suggest you forget that site and start here instead: regular-expressions.info .如果您想了解正则表达式,我建议您忘记该站点并从这里开始: regular-expressions.info

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM