根据条件拆分数百万个数据数组的最有效方法是什么？

Question

它是这样的，我有一个包含超过 1000 万数据的london数组

london = ['dwig7xmW','gIzbnHNI' ...]

现在我有一个userTraveled也包含数百万数据

userTraveled = ['ntuJV09a' ...]

现在将userTraveled拆分为inLondon和notInLondon的最有效方法是什么。

我的尝试。

inLondon = []
notInLondon = []

userTraveled.forEach((p) => london.includes(p) ? inLondon.push(p) : notInLondon.push(p))

Answer 1

london.includes(p)将对数组进行线性搜索。 为每个userTraveled这样做是非常低效的。 使用Set代替：

const usersInLondon = [], usersNotInLondon = [];
const lookup = new Set(london);

for (const p of usersTraveled) {
  (lookup.has(p) ? usersInLondon : usersNotInLondon).push(p);
}

Answer 2

我可以提供 O(n*log(n)) 解决方案而不是您的 O(n^2)，首先订购密码，然后使用二进制搜索而不是包含来搜索项目

希望它有帮助 =)

const london = ['dwig7xmW','gIzbnHNI']
const userTraveled = ['ntuJV09a', 'dwig7xmW']

let inLondon = []
let notInLondon = []

const sortedlondon=london.sort();
userTraveled.forEach((p) => (binarySearch(sortedlondon,p)!=-1 ? inLondon.push(p) : notInLondon.push(p)))

//https://www.htmlgoodies.com/javascript/how-to-search-a-javascript-string-array-using-a-binary-search/
function binarySearch(items, value){
    var startIndex  = 0,
        stopIndex   = items.length - 1,
        middle      = Math.floor((stopIndex + startIndex)/2);

    while(items[middle] != value && startIndex < stopIndex){

        //adjust search area
        if (value < items[middle]){
            stopIndex = middle - 1;
        } else if (value > items[middle]){
            startIndex = middle + 1;
        }

        //recalculate middle
        middle = Math.floor((stopIndex + startIndex)/2);
    }

    //make sure it's the right value
    return (items[middle] != value) ? -1 : middle;
}

Answer 3

我希望您没有以错误的方式使用这些数据。

const passwords = ['a', 'b']
const rawPasswords = ['c', 'b'];
const setPasswords = new Set(passwords)

const uniquePassword = [];
const usedPassword = [];

rawPasswords.forEach(rp => {
    if (setPasswords.has(rp)) {
    usedPassword.push(rp)
  } else {
    uniquePassword.push(rp)
  }
})

console.log(uniquePassword, usedPassword)

Answer 4

参考此答案进行性能测试：获取 JavaScript 数组中的所有唯一值（删除重复项）在您的情况下，最佳解决方案是使用 Object。 因为您需要了解重复项，而不仅仅是删除它们。

 function uniqueArray( ar ) { var j = {}; var k = []; var unique; ar.forEach( function(v) { if(j.hasOwnProperty(v)){ k.push(v); } else { j[v] = v; } }); unique = Object.keys(j).map(function(v){ return j[v]; }); return [unique, k]; } var arr = [1, 1, 2, 3, 4, 5, 4, 3]; console.log(uniqueArray(arr));

首先，它遍历输入数组并检查该值是否已经作为 object 上的键存在。 如果不是这种情况，它会添加它。 如果是，它将值推送到另一个数组。 由于对象使用 hash，因此 Javascript 引擎可以更快地使用它。

其次，它通过对象的键将其转回数组并最终返回两者。 我没有添加这个解释，因为提供的参考已经解释了它。

结果将是一个包含 2 个 arrays 的数组。 首先是具有唯一值的数组，其次是具有重复值的数组。

根据条件拆分数百万个数据数组的最有效方法是什么？

问题描述

4 个解决方案

解决方案1
2 已采纳 2022-08-11 08:59:46

解决方案2
0 2022-08-11 08:35:02

解决方案3
0 2022-08-11 09:04:24

解决方案4
-1 2022-08-11 08:59:01

根据条件拆分数百万个数据数组的最有效方法是什么？

问题描述

4 个解决方案

解决方案1 2 已采纳 2022-08-11 08:59:46

解决方案2 0 2022-08-11 08:35:02

解决方案3 0 2022-08-11 09:04:24

解决方案4 -1 2022-08-11 08:59:01

解决方案1
2 已采纳 2022-08-11 08:59:46

解决方案2
0 2022-08-11 08:35:02

解决方案3
0 2022-08-11 09:04:24

解决方案4
-1 2022-08-11 08:59:01