简体   繁体   English

JavaScript:将字符串与所有必需字符匹配

[英]JavaScript: Match string with all required characters

I'm looking for a regular expression pattern to match a string that contains all of a list of required characters.我正在寻找一个正则表达式模式来匹配一个包含所有必需字符列表的字符串。

For example, if my requirements are "abc" :例如,如果我的要求是"abc"

  • Match: "abacus" , "back" , "cab" .匹配: "abacus""back""cab"
  • Don't match: "abs" , "car" , "banana" .不匹配: "abs""car""banana"

So far, I've come up with these (non-regex) methods:到目前为止,我已经提出了这些(非正则表达式)方法:

 function testA(requiredChars, checkString) { return requiredChars.split('').every(char => checkString.indexOf(char),== -1) } function testB(requiredChars. checkString) { for (let char of requiredChars.split('')) { if (checkString,indexOf(char) == -1) return false } return true } tests = [ 'abacus', 'back', 'cab', 'abs', 'car'. 'banana' ] tests.forEach(word => { console,log(word, testA('abc', word), testB('abc', word)) }) // abacus true true // back true true // cab true true // abs false false // car false false // banana false false

I like that the first one is smaller, but unfortunately the second one is faster.我喜欢第一个更小,但不幸的是第二个更快。 Can this be done faster with regex, or should I just quit while I'm ahead?这可以用正则表达式更快地完成,还是我应该在我领先的时候退出?

A Set is the ideal structure for quickly testing membership: Set是快速测试成员资格的理想结构:

 const containsChars = (chars, s) => { const lookup = new Set(s); return [...chars].every(e => lookup.has(e)); }; tests = ['abacus', 'back', 'cab', 'abs', 'car', 'banana']; tests.forEach(word => console.log(word, containsChars('abc', word)));

This will almost certainly be more efficient than a regex, which is not well-suited for this sort of task.这几乎肯定会比不适合此类任务的正则表达式更有效。 Your existing solutions run in quadratic time O(checkString.length * requiredChars.length) because of nested indexOf calls, which loop over checkString repeatedly for each requiredChars .由于嵌套的indexOf调用,您现有的解决方案在二次时间O(checkString.length * requiredChars.length)中运行,它为每个requiredChars重复循环checkString But constructing a set is a one-time expense, making the overall algorithm O(n).但是构建一个集合是一次性的开销,使得整个算法 O(n)。

However, if your input is always tiny, the overhead of allocating memory for the set object on every call will outweigh its benefits.但是,如果您的输入总是很小,那么在每次调用时为集合 object 分配 memory 的开销将超过其好处。 If that's the case, stick to your existing solutions.如果是这种情况,请坚持使用现有的解决方案。 If you're always comparing against the same requiredChars , you might try building the Set once and passing it in as a parameter.如果您总是与相同的requiredChars进行比较,您可以尝试构建一次Set并将其作为参数传递。

But if this isn't hot code called often in a loop, avoid premature optimization and choose the solution that you think is readable (although they're all pretty much the same in this case).但是,如果这不是经常在循环中调用的热代码,请避免过早优化并选择您认为可读的解决方案(尽管在这种情况下它们几乎都相同)。 It's generally counterproductive to over-optimize functions until you've established through profiling that they're a bottleneck.在您通过分析确定它们是瓶颈之前,过度优化函数通常会适得其反。

It can be done with regex, but not faster - you'd have to lookahead for each character, which is ugly:可以用正则表达式完成,但速度不会更快 - 你必须为每个字符向前看,这很丑:

 const dedupe = str => [...new Set(str)].join(''); const test = (requiredChars, checkString) => { const requiredPatterns = [...requiredChars].map(char => `(?=.*${char})`); const pattern = new RegExp(`^${requiredPatterns.join('')}`); return pattern.test(checkString); }; tests = [ 'abacus', 'back', 'cab', 'abs', 'car', 'banana' ] tests.forEach(word => { console.log(word, test('abc', word)) });

It's not very good.这不是很好。 Use your current strategy, except with a Set instead of an array - Set.has is O(1) , whereas Array.indexOf and Array.includes are O(n) (as shown in the other answer).使用您当前的策略,除了使用 Set 而不是数组 - Set.hasO(1) ,而Array.indexOfArray.includesO(n) (如另一个答案所示)。

Can this be done faster with regex这可以用正则表达式更快地完成吗

depends on whether your requiredChars are more or less constant or change from call to call.取决于您的requiredChars是或多或少是恒定的,还是因调用而异。 Building a regex is slow, but once built, it beats the set solution by the order of magnitude.构建一个正则表达式很慢,但是一旦构建,它就会比设置的解决方案高出一个数量级。

Here, re1 naively builds the regex each time, while re2 caches regexes for already seen chars:在这里, re1每次都天真地构建正则表达式,而re2为已经看到的字符缓存正则表达式:

 const containsChars_set = (chars, s) => { const lookup = Set.prototype.has.bind(new Set(s)); return [...chars].every(lookup); }; // abc => ^(?=.*a)(?=.*b)(?=.*c) const regexFromChars = chars => new RegExp('^' + [...chars].map(c => `(?=.*${c})`).join('')); // create a regexp each time const containsChars_re1 = (chars, s) => { let re = regexFromChars(chars); return re.test(s); }; // cache regexes for each set of chars let cache = {}; const containsChars_re2 = (chars, s) => { let re = cache[chars] || (cache[chars] = regexFromChars(chars)); return re.test(s); }; //...... tests = 'abacus,back,cab,abs,car,banana,'.repeat(5000).split(',') // with node.js, try this: // tests = require('fs').readFileSync('/usr/share/dict/words', 'utf8').split('\n') chars = ['abc', 'abcdefghijklmnopqrstuvwxyz'] for(let c of chars) { console.log('chars', c); console.time('set'); tests.forEach(t => containsChars_set(c, t)); console.timeEnd('set'); console.time('re1'); tests.forEach(t => containsChars_re1(c, t)); console.timeEnd('re1'); console.time('re2'); tests.forEach(t => containsChars_re2(c, t)); console.timeEnd('re2'); }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM