简体   繁体   English

在JavaScript中检查字符串是否为空(即仅包含空格)的最高性能方法?

[英]The most performant way to check if a string is blank (i.e. only contains whitespace) in JavaScript?

I need to write a function which tests, if given string is "blank" in a sense that it only contains whitespace characters. 我需要编写一个测试函数,如果给定的字符串是“空白”,在某种意义上它只包含空白字符。 Whitespace characters are the following: 空白字符如下:

'\u0009',
'\u000A',
'\u000B',
'\u000C',
'\u000D',
' ',
'\u0085',
'\u00A0',
'\u1680',
'\u180E',
'\u2000',
'\u2001',
'\u2002',
'\u2003',
'\u2004',
'\u2005',
'\u2006',
'\u2007',
'\u2008',
'\u2009',
'\u200A',
'\u2028',
'\u2029',
'\u202F',
'\u205F',
'\u3000'

The function will be called a lot of times, so it must be really, really performant. 该函数将被调用很多次,因此它必须真正,真正高效。 But shouldn't take too much memory (like mapping every character to true/false in an array). 但是不应该占用太多内存(比如将每个字符映射到数组中的true / false)。 Things I've tried out so far: 到目前为止我尝试过的事情:

  • regexp - not quite performant regexp - 性能不高
  • trim and check if length is 0 - not quite performant, also uses additional memory to hold the trimmed string 修剪并检查长度是否为0 - 不太高效,还使用额外的内存来保持修剪后的字符串
  • checking every string character against a hash set containing whitespace characters ( if (!whitespaceCharactersMap[str[index]]) ... ) - works well enough 检查包含空格字符的哈希集的每个字符串字符( if (!whitespaceCharactersMap[str[index]]) ... ) - 运行良好
  • my current solution uses hardcoded comparisons: 我目前的解决方案使用硬编码比较:

     function(str) { var length = str.length; if (!length) { return true; } for (var index = 0; index < length; index++) { var c = str[index]; if (c === ' ') { // skip } else if (c > '\
    ' && c < '\…') { return false; } else if (c < '\ ') { if (c < '\	') { return false; } else if (c > '\…') { return false; } } else if (c > '\ ') { if (c < '\
') { if (c < '\᠎') { if (c < '\ ') { return false; } else if(c > '\ ') { return false; } } else if (c > '\᠎') { if (c < '\ ') { return false; } else if (c > '\ ') { return false; } } } else if (c > '\
') { if (c < '\ ') { if (c < '\ ') { return false; } else if (c > '\ ') { return false; } } else if (c > '\ ') { if (c < '\ ') { return false; } else if (c > '\ ') { return false; } } } } } return true; } 

This seems to work 50-100% faster than hash set (tested on Chrome). 这似乎比哈希集(在Chrome上测试)快50-100%。

Does anybody see or know further options? 有人看到或知道更多选择吗?

Update 1 更新1

I'll answer some of the comments here: 我会在这里回答一些评论:

  • It's not just checking user input for emptyness. 它不只是检查用户输入的空白​​。 I have to parse certain data format where whitespace must be handled separately. 我必须解析必须单独处理空格的某些数据格式。
  • It is worth optimizing. 值得优化。 I've profiled the code before. 我之前已经对代码进行过分析。 Checking for blank strings seems to be an issue. 检查空字符串似乎是一个问题。 And, as we saw, the difference in performance between approaches can be up to 10 times, it's definitely worth the effort. 而且,正如我们所看到的,方法之间的性能差异可能高达10倍,这绝对值得付出努力。
  • Generally, I find this "hash set vs. regex vs. switch vs. branching" challenge very educating. 一般来说,我发现这种“哈希集与正则表达式与交换与分支”的挑战非常有教育意义。
  • I need the same functionality for browsers as well as node.js. 我需要与浏览器以及node.js相同的功能。

Now here's my take on performance tests: 现在,这是我对性能测试的看法:

http://jsperf.com/hash-with-comparisons/6 http://jsperf.com/hash-with-comparisons/6

I'd be grateful if you guys run these tests a couple of times. 如果你们经常进行这些测试,我将不胜感激。

Preliminary conclusions: 初步结论:

  • branchlessTest ( a^9*a^10*a^11... ) is extremely fast in Chrome and Firefox, but not in Safari. branchlessTest( a^9*a^10*a^11... )在Chrome和Firefox中非常快,但在Safari中却没有。 Probably the best choice for Node.js from performance perspective. 从性能角度来看,可能是Node.js的最佳选择。
  • switchTest is also quite fast on Chrom and Firefox, but, surprizingly the slowest in Safari and Opera 在Chrom和Firefox上,switchTest的速度也相当快,但令人惊讶的是Safari和Opera的速度最慢
  • Regexps with re.test(str) perform well everywhere, even fastest in Opera. re.test(str)的regexps在各处表现都很好,甚至在Opera中表现最快。
  • Hash and branching show almost identically poor results almost everywhere. 哈希和分支几乎在所有地方都显示出几乎相同的差异结果。 Comparision is also similar, often worst performance (this may be due to the implementation, check for ' ' should be the first one). 比较也类似,通常表现最差(这可能是由于实施,检查' '应该是第一个)。

To sum up, for my case I'll opt to the following regexp version: 总而言之,对于我的情况,我将选择以下正则表达式版本:

var re = /[^\s]/;
return !re.test(str);

Reasons: 原因:

  • branchless version is cool in Chrome and Firefox but isn't quite portable 无分支版本在Chrome和Firefox中很酷,但不太便携
  • switch is too slow in Safari 在Safari中切换太慢
  • regexps seem to perform well everywhere, they'll also very compact in code regexps似乎无处不在,它们在代码中也非常紧凑

Hard-coded solution seems the best, but I think switch should be faster. 硬编码解决方案似乎是最好的,但我认为switch应该更快。 It depends on the way JavaScript interpreter handles these (most compilers do this very efficiently), so it may be browser-specific (ie, fast in some, slow in others). 这取决于JavaScript解释器处理这些的方式(大多数编译器非常有效地执行此操作),因此它可能是特定于浏览器的(即,某些编译器速度快,而其他编译器速度慢)。 Also, I'm not sure how fast JavaScript is with UTF-strings, so you might try converting a character to its integer code before comparing the values. 此外,我不确定JavaScript对UTF字符串的速度有多快,因此您可以尝试在比较值之前将字符转换为整数代码。

for (var index = 0; index < length; index++)
{
    var c = str.charCodeAt(index);
    switch (c) {
        case 0x0009: case 0x000A: case 0x000B: case 0x000C: case 0x000D: case 0x0020:
        case 0x0085: case 0x00A0: case 0x1680: case 0x180E: case 0x2000: case 0x2001:
        case 0x2002: case 0x2003: case 0x2004: case 0x2005: case 0x2006: case 0x2007:
        case 0x2008: case 0x2009: case 0x200A: case 0x2028: case 0x2029: case 0x202F:
        case 0x205F: case 0x3000: continue;
    }
    return false;
}

Another thing to consider is changing for : 另一个要考虑的是改变for

for (var index in str)
{
    ...
}

Edit 编辑

Your jsPerf test got some revisions, the current one available here . 你的jsPerf测试得到了一些修改,现在可以在这里修改 My code is significantly faster in Chrome 26 and 27, and in IE10, but it's also the slowest one in Firefox 18. 我的代码在Chrome 26和27以及IE10中明显更快,但它也是Firefox 18中最慢的代码。

I ran the same test (I don't know how to make jsPerf save those) on Firefox 20.0 on 64-bit Linux and it turned out to be one of the two fastest ones (tied with trimTest , both at about 11.8M ops/sec). 在64位Linux上的Firefox 20.0上运行了相同的测试(我不知道如何使jsPerf保存那些),结果发现它是两个最快的测试之一(与trimTest ,两者都在大约11.8M ops /秒)。 I also tested Firefox 20.0.1 on WinXP , but under a VirtualBox (still under 64bit Linux, which might make a significant difference here), which gave 10M ops/sec to switchTest , with trimTest coming second at 7.3M ops/sec. 我还在WinXP上测试了Firefox 20.0.1 ,但是在VirtualBox下(仍然在64位Linux下,这可能会产生显着的差异),这给了switchTest 10M ops / sec,其中trimTestswitchTest ops / sec的速度获得第二。

So, I'm guessing that the performance depends on the browser version and/or maybe even on the underlying OS/hardware (I suppose the above FF18 test was on Win). 所以,我猜测性能取决于浏览器版本和/或甚至可能在底层OS /硬件上(我认为上面的FF18测试是在Win上)。 In any case, to make a truly optimal version, you'll have to make many versions, test each on all browsers, OSes, architectures,... you can get a hold of, and then include in your page the version best suited for the visitor's browser, OS, architecture,... I'm not sure what kind of code is worth the trouble, though. 在任何情况下,要制作一个真正优化的版本,你必须制作许多版本,在所有浏览器,操作系统,架构上测试每个版本......你可以掌握,然后在你的页面中包含最适合的版本对于访问者的浏览器,操作系统,架构,......我不确定哪种代码值得麻烦。

Since branching is much more expensive than most other operations, you want to keep branches to a minimum. 由于分支比大多数其他操作昂贵得多,因此您希望将分支保持在最低限度。 Thus, your sequence of if/else statements may not be very performant. 因此,您的if / else语句序列可能不是非常高效。 A method which instead uses mostly math would be a lot faster. 一种主要使用数学的方法会快得多。 For example: 例如:

One way of performing an equality check without using any branching is to use bitwise operations. 在不使用任何分支的情况下执行相等性检查的一种方法是使用按位运算。 One example is, to check that a == b: 一个例子是,检查a == b:

a ^ b == 0

Since the xor of two similar bits (ie, 1 ^ 1 or 0 ^ 0) is 0, xor-ing two equal values produces 0. This is useful because it allows us to treat 0 as a "true" value, and do more math. 由于两个相似位(即1 ^ 1或0 ^ 0)的xor为0,因此xor-two两个相等的值产生0.这很有用,因为它允许我们将0视为“真”值,并执行更多操作数学。 Imagine that we have a bunch of boolean variables represented in this way: nonzero numbers are false, and zero means true. 想象一下,我们有一堆以这种方式表示的布尔变量:非零数字为假,零意味着为真。 If we want to ask, "is any of these true?" 如果我们想问,“这些都是真的吗?” we simply multiply them all together. 我们简单地将它们相乘。 If any of them were true (equal to zero), the entire result would be zero. 如果它们中的任何一个为真(等于零),则整个结果将为零。

So, for example, the code would look something like this: 因此,例如,代码看起来像这样:

function(str) {
    for (var i = 0; i < str.length; i++) {
        var c = str[i];
        if ((c ^ '\u0009') * (c ^ '\u000A') * (c ^ '\u000B') ... == 0)
            continue;
        return false;
    }
    return true;
}

The primary reason that this would be more performant than simply doing something like: 这样做的主要原因是比仅执行以下操作更具性能:

if ((c == '\u0009') || (c == '\u000A') || (c == '\u0008') ...)

is that JavaScript has short-circuit boolean operators, meaning that every time the || 是JavaScript有短路布尔运算符,意味着每次都是|| operator is used, it not only performs the or operation, but also checks to see if it can prove that the statement must be true thus far, which is a branching operation, which is expensive. 运算符被使用,它不仅执行或操作,而且还检查它是否可以证明该语句到目前为止必须为真,这是一个昂贵的分支操作。 The math approach, on the other hand, involves no branching, except for the if statement itself, and should thus be much faster. 另一方面,数学方法不涉及分支,除了if语句本身,因此应该更快。

This creates and uses a 'hash' lookup on the characters of the string, if it detects a non-whitespace then returns false: 这会在字符串的字符上创建并使用'hash'查找,如果它检测到非空格,则返回false:

var wsList=['\u0009','\u000A','\u000B','\u000C','\u000D',' ','\u0085','\u00A0','\u1680','\u180E','\u2000','\u2001','\u2002','\u2003','\u2004','\u2005','\u2006','\u2007','\u2008','\u2009','\u200A','\u2028','\u2029','\u202F','\u205F','\u3000'];
var ws=Object.create(null);
wsList.forEach(function(char){ws[char]=true});
function isWhitespace(txt){
    for(var i=0, l=txt.length; i<l; ++i){
        if(!ws[txt[i]])return false;
    }
    return true;
}

var test1=" \u1680 \u000B \u2002 \u2004";
isWhitespace(test1);
/*
true
*/
var test2=" _ . a ";
isWhitespace(test2);
/*
false
*/

Not sure about it's performance (yet) . 不确定它的性能 (还) After a quick test on jsperf, it turns out to be quite slow compared to RegExp using /^\\s*$/ . 在对jsperf进行快速测试之后,与使用/^\\s*$/ RegExp相比,它变得非常慢。


edit: 编辑:

It appears that the solution you should go with might likely depend on the nature of the data you are working with: Is the data mostly whitespace or mostly non-whitespace? 您应该使用的解决方案似乎可能取决于您正在使用的数据的性质:数据主要是空白还是大多数非空白? Also mostly ascii-range text? 也主要是ascii范围文本? You might be able to speed it up for average test cases by using range checks (via if ) for common non-whitespace character ranges, using switch on the most common whitespace, then using a hash lookup for everything else. 您可以通过对常见的非空白字符范围使用范围检查(通过if ),使用最常见的空格上的switch ,然后对其他所有内容使用哈希查找来加快平均测试用例的速度。 This will likely improve average performance of the tests if most of the data being tested is comprised of the most common characters (between 0x0--0x7F). 如果测试的大多数数据由最常见的字符组成(在0x0--0x7F之间),这可能会提高测试的平均性能。

Maybe something like this (a hybrid of if/switch/hash) could work: 也许像这样(if / switch / hash的混合)可以工作:

/*same setup as above with variable ws being a hash lookup*/
function isWhitespaceHybrid(txt){
    for(var i=0, l=txt.length; i<l; ++i){
        var cc=txt.charCodeAt(i)
        //above space, below DEL
        if(cc>0x20 && cc<0x7F)return false;
        //switch only the most common whitespace
        switch(cc){
            case 0x20:
            case 0x9:
            case 0xA:
            case 0xD:
            continue;
        }
        //everything else use a somewhat slow hash lookup (execute for non-ascii range text)
        if(!ws[txt[i]])return false;
    }
    return true;
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 搜索最高性能的字符串替换为javascript的方法 - Searching for most performant way for string replacing with javascript Javascript:检查字符串是否仅包含字母、数字、空格和特定符号 - Javascript: Check if string contains only letters, numbers, whitespace, and specific symbols 如何检查输入是否仅包含空格字符。 即只有一个或多个空格,没有其他字符 - How can I check if an input contains space characters only. I.e. one or more spaces only and no other characters 如何检查字符串是否包含字符和空格,而不仅仅是空格? - How can I check if string contains characters & whitespace, not just whitespace? 在客户端(即在 javascript 中)检查 firebase db ref 的正确方法是什么? - What is the correct way to check firebase db ref on client side(i.e. in javascript)? 检查元素是否重叠的最高效方法 - Most performant way to check if elements are overlapping 检查字符串在JavaScript中是否仅包含空格的最快方法是什么? - What is the fastest way to check if a string contains only white spaces in JavaScript? 编写自定义.on()/。bind()JavaScript的最佳方式 - Most performant way to write custom .on()/.bind() JavaScript 深度复制对象的最有效方式javascript - most performant way to deep copy objects javascript 用分隔符连接字符串的最有效方法是什么 - What is the most performant way to concatenate a string with a separator
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM