简体   繁体   English

JavaScript RegExp匹配二进制数据

[英]JavaScript RegExp matches binary data

I'm using RegExp to match a series of bytes other than 0x1b from the sequence [0xcb, 0x98, 0x1b] : 我正在使用RegExp来匹配序列[0xcb, 0x98, 0x1b]0x1b以外的一系列字节:

var r = /[^\x1b]+/g;
r.exec(new Buffer([0xcb, 0x98, 0x1b]));
console.log(r.lastIndex);

I expected that the pattern would match 0xcb, 0x98 and r.lastIndex == 2 , but it matches 0xcb only and r.lastIndex == 1 . 我期望该模式将匹配0xcb, 0x98r.lastIndex == 2 ,但它仅匹配0xcbr.lastIndex == 1 Is that a bug or something? 那是臭虫吗?

regexp.exec() implicitly converts its argument to a string and Buffer 's default encoding for toString() is UTF-8. regexp.exec()隐式将其参数转换为字符串,并且BuffertoString()缺省编码为UTF-8。 So you will not be able to see the individual bytes any longer with that encoding. 因此,使用该编码,您将再也看不到各个字节。 Instead, you will need to explicitly use the 'latin1' encoding (eg Buffer.from([0xcb, 0x98, 0x1b]).toString('latin1') ), which is a single-byte encoding (making the result the equivalent of '\\xcb\\x98\\x1b' ). 相反,您将需要显式使用'latin1'编码(例如Buffer.from([0xcb, 0x98, 0x1b]).toString('latin1') ),这是单字节编码(使结果等于'\\xcb\\x98\\x1b' )。

RegExp.prototype.exec works on strings. RegExp.prototype.exec适用于字符串。 This means that the Buffer is being implicitly cast to a string by the toString method. 这意味着toString方法toString Buffer隐式转换为字符串。

In doing so, the bytes are read as a UTF-8 string, as UTF-8 is the default encoding. 这样做时,将字节作为UTF-8字符串读取,因为UTF-8是默认编码。 From Node's Buffer documentation : Node的Buffer文档中

 buf.toString([encoding[, start[, end]]]) 

encoding <string> The character encoding to decode to. encoding <string>要解码为的字符编码。 Default: 'utf8' 默认值: 'utf8'

... ...

0xcb and 0x98 are read as a single UTF-8 character ( ˘ ), thus the third byte ends up being at the 1 index, not the 2 index. 0xcb0x98作为单个UTF-8字符( ˘ )读取,因此第三个字节最终位于1索引,而不是2索引。

One option might be to explicitly call the toString method with a different encoding, but I'm thinking regex is probably not the best option here. 一种选择可能是使用不同的编码显式调用toString方法,但我认为正则表达式可能不是此处的最佳选择。

您可以使用Array.prototype.filter()返回包含不等于0x1b值的数组

new Buffer([0xcb, 0x98, 0x1b].filter(byte => byte !== 0x1b))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM