[英]Performatic and “correct” way to search string within an array in javascript
我正在寻找一种非常快速的数组搜索解决方法。
我真正需要的是:将一连串的电子邮件作为csv字符串对照黑名单进行检查。
我的解决方案,针对每封电子邮件:
使用blacklist.indexOf(email) >= 0
非常慢。 我试过了
"email1@gmail.com;email2@gmail.com ..."
将黑名单拆分为一个数组并使用array.IndexOf(email) >= 0
更快
["email1@gmail.com","email2@gmail.com" ...]
创建一个对象,每个属性都是来自黑名单的电子邮件,并分配给“ true”,然后执行myObject[email]
; 这似乎要快得多,但看起来非常像“ k头”。
{"email1@gmail.com":true,"email2@gmail.com":true ...}
我如何才能快速进行搜索,而不会成为“麻烦”?
PS问题不是黑名单的大小,黑名单有近1k的电子邮件。 但是,我们每次都必须检查40万封电子邮件。
我会说,最好使用预先填充的Map
。 您可以拆分csv字符串并对其进行迭代。 我编写了两个性能测试,并在Chrome中运行了它们。 借助https://developer.mozilla.org/en-US/docs/Web/API/Performance/measure的帮助。
我创建了两个地图。 包含40万个条目的电子邮件映射和包含1k个条目的黑名单映射。 缺点:初始化需要很长时间。
// noprotect
console.clear();
const EMAIL_COUNT = 400000;
const BLACKLIST_EMAIL_COUNT = 1000;
let mailMatches = 0;
// arrays
const emails = new Map();
const blacklist = new Map();
// 1k blacklisted mails
for (let bl = 0; bl < BLACKLIST_EMAIL_COUNT; bl++) {
if (bl % 2 === 0) {
blacklist.set('email' + bl, 'email' + bl);
} else {
blacklist.set('email@' + bl, 'email@' + bl);
}
}
// 400k mails
for (let j = 0; j < EMAIL_COUNT; j++) {
emails.set('email' + j, 'email' + j);
}
performance.mark('perfMailList-start');
// 1ms (includes, emails, reverse)
blacklist.forEach(blacklistItem => {
if (emails.has(blacklistItem)) {
mailMatches++;
}
});
// 32ms
/*emails.forEach(email => {
if(blacklist.has(email)) {
mailMatches++;
}
})*/
performance.mark('perfMailList-end');
performance.measure('perfMailList', 'perfMailList-start', 'perfMailList-end');
const measures = performance.getEntriesByName('perfMailList');
const measure = measures[0];
console.log(`${measure.duration}ms and ${mailMatches} found blacklisted mails`);
// Clean up the stored markers.
performance.clearMarks();
performance.clearMeasures();
还有一些循环( includes
反向的forEach循环)或交替使用includes
或indexOf
。
// noprotect
console.clear();
const EMAIL_COUNT = 400000;
const BLACKLIST_EMAIL_COUNT = 1000;
let mailMatches = 0;
// arrays
const emails = [];
const blacklist = [];
// 1k blacklisted mails
for (let bl = 0; bl < BLACKLIST_EMAIL_COUNT; bl++) {
// console.log(i)
if (bl % 2 === 0) {
blacklist.push('email' + bl);
} else {
blacklist.push('email@' + bl);
}
}
// 400k mails
for (let j = 0; j < EMAIL_COUNT; j++) {
emails.push('email' + j);
}
performance.mark('perfMailList-start');
// 524ms (indexOf, emails)
/*emails.forEach(mail => {
if(blacklist.indexOf(mail) >= 0){
mailMatches++;
}
})*/
// 583ms (includes, blacklist)
/*blacklist.forEach(blacklistItem => {
if(emails.indexOf(blacklistItem) >= 0){
mailMatches++;
}
})*/
// --------------------------
// 521ms (includes, emails)
/*emails.forEach(mail => {
if(blacklist.includes(mail)){
mailMatches++;
}
})*/
// 600ms (includes, blacklist)
/*blacklist.forEach(blacklistItem => {
if(emails.includes(blacklistItem)){
mailMatches++;
}
})*/
// --------------------------
// 638ms (includes, emails, reverse)
/*for(var i = BLACKLIST_EMAIL_COUNT; i--;) {
if(emails.includes(blacklist[i])){
mailMatches++;
}
}*/
// 632ms (indexOf, emails, reverse)
/*for(var i = BLACKLIST_EMAIL_COUNT; i--;) {
if(emails.indexOf(blacklist[i]) >= 0){
mailMatches++;
}
}*/
// --------------------------
// 530ms (includes, emails)
/*for(var i = EMAIL_COUNT; i--;) {
if(blacklist.includes(emails[i])){
mailMatches++;
}
}*/
// 530ms (indexOf, emails)
/*for(var i = EMAIL_COUNT; i--;) {
if(blacklist.indexOf(emails[i]) >= 0){
mailMatches++;
}
}*/
// --------------------------
// 525ms (includes, emails)
/*for(let i = 0; i < EMAIL_COUNT; i++) {
if(blacklist.includes(emails[i])) {
mailMatches++;
}
}*/
// 540ms (indexOf, emails)
/*for(let i = 0; i < EMAIL_COUNT; i++) {
if(blacklist.indexOf(emails[i]) >= 0) {
mailMatches++;
}
}*/
// --------------------------
// 668ms (includes, blacklist)
/*for(let i = 0; i < BLACKLIST_EMAIL_COUNT; i++) {
if(emails.includes(blacklist[i])) {
mailMatches++;
}
}*/
// 687ms (indexOf, blacklist)
/*for(let k = 0; k < BLACKLIST_EMAIL_COUNT; k++) {
if(emails.indexOf(blacklist[k]) >= 0) {
mailMatches++;
}
}*/
// --------------------------
// 1367ms (equals)
/*for(let i = 0; i < EMAIL_COUNT; i++) {
for(let k = 0; k < BLACKLIST_EMAIL_COUNT; k++) {
if(emails[i] === blacklist[k]) {
mailMatches++;
}
}
}*/
performance.mark('perfMailList-end');
performance.measure('perfMailList', 'perfMailList-start', 'perfMailList-end');
const measures = performance.getEntriesByName('perfMailList');
const measure = measures[0];
console.log(`${measure.duration}ms and ${mailMatches} found blacklisted mails`);
// Clean up the stored markers.
performance.clearMarks();
performance.clearMeasures();
MacBook: Pro(15英寸,2016年)
处理器: 2.9 GHz Intel Core i7
内存: 16 GB 2133 MHz LPDDR3
使用Array#includes
并让引擎实现者担心优化
blacklist.includes(email)
或者,使用集合或地图
https://jsperf.com/array-includes-and-find-methods-vs-set-has
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.