繁体   English   中英

执行和“正确”的方法来搜索JavaScript中数组中的字符串

[英]Performatic and “correct” way to search string within an array in javascript

我正在寻找一种非常快速的数组搜索解决方法。

我真正需要的是:将一连串的电子邮件作为csv字符串对照黑名单进行检查。

我的解决方案,针对每封电子邮件:

  1. 使用blacklist.indexOf(email) >= 0非常慢。 我试过了

    "email1@gmail.com;email2@gmail.com ..."

  2. 将黑名单拆分为一个数组并使用array.IndexOf(email) >= 0更快

    ["email1@gmail.com","email2@gmail.com" ...]

  3. 创建一个对象,每个属性都是来自黑名单的电子邮件,并分配给“ true”,然后执行myObject[email] 这似乎要快得多,但看起来非常像“ k头”。

    {"email1@gmail.com":true,"email2@gmail.com":true ...}

我如何才能快速进行搜索,而不会成为“麻烦”?

PS问题不是黑名单的大小,黑名单有近1k的电子邮件。 但是,我们每次都必须检查40万封电子邮件。

我会说,最好使用预先填充的Map 您可以拆分csv字符串并对其进行迭代。 我编写了两个性能测试,并在Chrome中运行了它们。 借助https://developer.mozilla.org/en-US/docs/Web/API/Performance/measure的帮助。

我创建了两个地图。 包含40万个条目的电子邮件映射和包含1k个条目的黑名单映射。 缺点:初始化需要很长时间。

// noprotect
console.clear();

const EMAIL_COUNT = 400000;
const BLACKLIST_EMAIL_COUNT = 1000;
let mailMatches = 0;

// arrays
const emails = new Map();
const blacklist = new Map();

// 1k blacklisted mails
for (let bl = 0; bl < BLACKLIST_EMAIL_COUNT; bl++) {
    if (bl % 2 === 0) {
        blacklist.set('email' + bl, 'email' + bl);
    } else {
        blacklist.set('email@' + bl, 'email@' + bl);
    }
}

// 400k mails
for (let j = 0; j < EMAIL_COUNT; j++) {
    emails.set('email' + j, 'email' + j);
}

performance.mark('perfMailList-start');

// 1ms (includes, emails, reverse)
blacklist.forEach(blacklistItem => {
    if (emails.has(blacklistItem)) {
        mailMatches++;
    }
});

// 32ms
/*emails.forEach(email => {
    if(blacklist.has(email)) {
        mailMatches++;
    }
})*/

performance.mark('perfMailList-end');

performance.measure('perfMailList', 'perfMailList-start', 'perfMailList-end');

const measures = performance.getEntriesByName('perfMailList');
const measure = measures[0];

console.log(`${measure.duration}ms and ${mailMatches} found blacklisted mails`);

// Clean up the stored markers.
performance.clearMarks();
performance.clearMeasures();

还有一些循环( includes反向的forEach循环)或交替使用includesindexOf

// noprotect
console.clear();

const EMAIL_COUNT = 400000;
const BLACKLIST_EMAIL_COUNT = 1000;
let mailMatches = 0;

// arrays
const emails = [];
const blacklist = [];

// 1k blacklisted mails
for (let bl = 0; bl < BLACKLIST_EMAIL_COUNT; bl++) {
    // console.log(i)
    if (bl % 2 === 0) {
        blacklist.push('email' + bl);
    } else {
        blacklist.push('email@' + bl);
    }
}

// 400k mails
for (let j = 0; j < EMAIL_COUNT; j++) {
    emails.push('email' + j);
}

performance.mark('perfMailList-start');

// 524ms (indexOf, emails)
/*emails.forEach(mail => {
if(blacklist.indexOf(mail) >= 0){
        mailMatches++;
}
})*/

// 583ms (includes, blacklist)
/*blacklist.forEach(blacklistItem => {
if(emails.indexOf(blacklistItem) >= 0){
        mailMatches++;
}
})*/

// --------------------------

// 521ms (includes, emails)
/*emails.forEach(mail => {
if(blacklist.includes(mail)){
        mailMatches++;
}
})*/

// 600ms (includes, blacklist)
/*blacklist.forEach(blacklistItem => {
if(emails.includes(blacklistItem)){
        mailMatches++;
}
})*/

// --------------------------

// 638ms (includes, emails, reverse)
/*for(var i = BLACKLIST_EMAIL_COUNT; i--;) {
    if(emails.includes(blacklist[i])){
        mailMatches++;
    }
}*/

// 632ms (indexOf, emails, reverse)
/*for(var i = BLACKLIST_EMAIL_COUNT; i--;) {
    if(emails.indexOf(blacklist[i]) >= 0){
        mailMatches++;
    }
}*/

// --------------------------

// 530ms (includes, emails)
/*for(var i = EMAIL_COUNT; i--;) {
    if(blacklist.includes(emails[i])){
        mailMatches++;
    }
    }*/

// 530ms (indexOf, emails)
/*for(var i = EMAIL_COUNT; i--;) {
    if(blacklist.indexOf(emails[i]) >= 0){
        mailMatches++;
    }
}*/

// --------------------------

// 525ms (includes, emails)
/*for(let i = 0; i < EMAIL_COUNT; i++) {
    if(blacklist.includes(emails[i])) {
        mailMatches++;
    }
}*/

// 540ms (indexOf, emails)
/*for(let i = 0; i < EMAIL_COUNT; i++) {
    if(blacklist.indexOf(emails[i]) >= 0) {
        mailMatches++;
    }
    }*/

// --------------------------

// 668ms (includes, blacklist)
/*for(let i = 0; i < BLACKLIST_EMAIL_COUNT; i++) {
    if(emails.includes(blacklist[i])) {
        mailMatches++;
    }
}*/

// 687ms (indexOf, blacklist)
/*for(let k = 0; k < BLACKLIST_EMAIL_COUNT; k++) {
    if(emails.indexOf(blacklist[k]) >= 0) {
        mailMatches++;
    }
}*/

// --------------------------

// 1367ms (equals)
/*for(let i = 0; i < EMAIL_COUNT; i++) {
    for(let k = 0; k < BLACKLIST_EMAIL_COUNT; k++) {
        if(emails[i] === blacklist[k]) {
        mailMatches++;
        }
    }
}*/

performance.mark('perfMailList-end');

performance.measure('perfMailList', 'perfMailList-start', 'perfMailList-end');

const measures = performance.getEntriesByName('perfMailList');
const measure = measures[0];

console.log(`${measure.duration}ms and ${mailMatches} found blacklisted mails`);

// Clean up the stored markers.
performance.clearMarks();
performance.clearMeasures();

MacBook: Pro(15英寸,2016年)

处理器: 2.9 GHz Intel Core i7

内存: 16 GB 2133 MHz LPDDR3

使用Array#includes并让引擎实现者担心优化

blacklist.includes(email)

或者,使用集合或地图

https://jsperf.com/array-includes-and-find-methods-vs-set-has

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM