简体   繁体   English

JavaScript 正则表达式空白字符

[英]JavaScript regex whitespace characters

I have done some searching, but I couldn't find a definitive list of whitespace characters included in the \s in JavaScript's regex.我进行了一些搜索,但在 JavaScript 的正则表达式中找不到包含在\s中的空白字符的明确列表。

I know that I can rely on space, line feed, carriage return, and tab as being whitespace, but I thought that since JavaScript was traditionally only for the browser, maybe URL encoded whitespace and things like  我知道我可以依赖空格、换行符、回车和制表符作为空格,但我认为由于 JavaScript 传统上仅适用于浏览器,因此可能是 URL 编码的空格和 之类的东西。 and %20 would be supported as well.并且%20也将被支持。

What exactly is considered by JavaScript's regex compiler? JavaScript 的正则表达式编译器到底考虑了什么? If there are differences between browsers, I only really care about webkit browsers, but it would be nice to know of any differences.如果浏览器之间存在差异,我只关心 webkit 浏览器,但很高兴知道任何差异。 Also, what about Node.js?另外,Node.js 呢?

A simple test:一个简单的测试:

for(var i = 0; i < 1114111; i++) {
    if(String.fromCodePoint(i).replace(/\s+/, "") == "") console.log(i);
}

The char codes (Chrome):字符代码(Chrome):

9
10
11
12
13
32
160
5760
8192
8193
8194
8195
8196
8197
8198
8199
8200
8201
8202
8232
8233
8239
8287
12288
65279
["

For Mozilla its;<\/i>对于 Mozilla 来说;<\/b><\/p>

 [ \f\n\r\t\v\u00A0\u2028\u2029]

HTML != Javascript. HTML != Javascript。 Javascript is completely literal, %20 is %20 and &nbsp; Javascript 完全是文字, %20 是 %20 和&nbsp; is a string of characters & nbsp and ;.是一串字符 & nbsp 和 ;。 For character classes I consider nearly every that is RegEx in perl to be applicable in JS (you can't do named groups etc).对于字符类,我认为 perl 中几乎所有 RegEx 都适用于 JS(你不能做命名组等)。

http://www.regular-expressions.info/javascript.html is the refernece I use. http://www.regular-expressions.info/javascript.html是我使用的参考。

Here's an expansion of primvdb's answer , covering the entire 16-bit space, including unicode code point values and a comparison with str.trim().这是primvdb 答案的扩展,涵盖了整个 16 位空间,包括 unicode 代码点值以及与 str.trim() 的比较。 I tried to edit the answer to improve it, but my edit was rejected, so I had to post this new one.我试图编辑答案以改进它,但我的编辑被拒绝了,所以我不得不发布这个新的。

Identify all single-byte characters which will be matched as whitespace regex \s or by String.prototype.trim() :识别将匹配为空白正则表达式\sString.prototype.trim()的所有单字节字符:

 const regexList = []; const trimList = []; for (let codePoint = 0; codePoint < 2 ** 16; codePoint += 1) { const str = String.fromCodePoint(codePoint); const unicode = codePoint.toString(16).padStart(4, '0'); if (str.replace(/\s/, '') === '') regexList.push([codePoint, unicode]); if (str.trim() === '') trimList.push([codePoint, unicode]); } const identical = JSON.stringify(regexList) === JSON.stringify(trimList); const list = regexList.reduce((str, [codePoint, unicode]) => `${str}${unicode} ${codePoint}\n`, ''); console.log({identical}); console.log(list);

The list (in V8):列表(在 V8 中):

0009 9
000a 10
000b 11
000c 12
000d 13
0020 32
00a0 160
1680 5760
2000 8192
2001 8193
2002 8194
2003 8195
2004 8196
2005 8197
2006 8198
2007 8199
2008 8200
2009 8201
200a 8202
2028 8232
2029 8233
202f 8239
205f 8287
3000 12288
feff 65279
["

In Firefox<\/strong> \\s - matches a single white space character, including space, tab, form feed, line feed.<\/i>Firefox<\/strong>中 \\s - 匹配单个空格字符,包括空格、制表符、换页符、换行符。<\/b> Equivalent to [ \\f\\n\\r\\t\\v\ \
\
].<\/i>等价于 [ \\f\\n\\r\\t\\v\ \
\
]。<\/b><\/p>

For example, \/\\s\\w*\/ matches ' bar' in "foo bar."<\/strong><\/i>例如,\/\\s\\w*\/ 匹配“foo bar”中的“bar”。<\/strong><\/b><\/p>

https:\/\/developer.mozilla.org\/en\/JavaScript\/Guide\/Regular_Expressions<\/a><\/i> https:\/\/developer.mozilla.org\/en\/JavaScript\/Guide\/Regular_Expressions<\/a><\/b><\/p>"]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM