what's the most efficient way to split an array of millions of data based on condition?

Question

It goes something like this where I have a london array containing more than 10 million data

london = ['dwig7xmW','gIzbnHNI' ...]

And now I have a userTraveled which also contains millions of data

userTraveled = ['ntuJV09a' ...]

Now what's the most efficient way to split userTraveled into inLondon and notInLondon .

My attempt.

inLondon = []
notInLondon = []

userTraveled.forEach((p) => london.includes(p) ? inLondon.push(p) : notInLondon.push(p))

Answer 1

london.includes(p) will do a linear search over the array. Doing that for every userTraveled is horribly inefficient. Use a Set instead:

const usersInLondon = [], usersNotInLondon = [];
const lookup = new Set(london);

for (const p of usersTraveled) {
  (lookup.has(p) ? usersInLondon : usersNotInLondon).push(p);
}

Answer 2

I can offer a O(n*log(n)) solution instead of your O(n^2), first order the passwords and later use the binary search on it instead of the include to search for an item

Hope it helps =)

const london = ['dwig7xmW','gIzbnHNI']
const userTraveled = ['ntuJV09a', 'dwig7xmW']

let inLondon = []
let notInLondon = []

const sortedlondon=london.sort();
userTraveled.forEach((p) => (binarySearch(sortedlondon,p)!=-1 ? inLondon.push(p) : notInLondon.push(p)))

//https://www.htmlgoodies.com/javascript/how-to-search-a-javascript-string-array-using-a-binary-search/
function binarySearch(items, value){
    var startIndex  = 0,
        stopIndex   = items.length - 1,
        middle      = Math.floor((stopIndex + startIndex)/2);

    while(items[middle] != value && startIndex < stopIndex){

        //adjust search area
        if (value < items[middle]){
            stopIndex = middle - 1;
        } else if (value > items[middle]){
            startIndex = middle + 1;
        }

        //recalculate middle
        middle = Math.floor((stopIndex + startIndex)/2);
    }

    //make sure it's the right value
    return (items[middle] != value) ? -1 : middle;
}

Answer 3

I hope you are not using these data in a wrong way.

const passwords = ['a', 'b']
const rawPasswords = ['c', 'b'];
const setPasswords = new Set(passwords)

const uniquePassword = [];
const usedPassword = [];

rawPasswords.forEach(rp => {
    if (setPasswords.has(rp)) {
    usedPassword.push(rp)
  } else {
    uniquePassword.push(rp)
  }
})

console.log(uniquePassword, usedPassword)

Answer 4

Referring to this answer for performance tests: Get all unique values in a JavaScript array (remove duplicates) the best solution in your case would be to use an Object. Since you require to know about the duplicates and not just remove them.

 function uniqueArray( ar ) { var j = {}; var k = []; var unique; ar.forEach( function(v) { if(j.hasOwnProperty(v)){ k.push(v); } else { j[v] = v; } }); unique = Object.keys(j).map(function(v){ return j[v]; }); return [unique, k]; } var arr = [1, 1, 2, 3, 4, 5, 4, 3]; console.log(uniqueArray(arr));

First it loops through the input array and checks if the value is already existing as a key on the object. If that's not the case, it adds it. If it is, it pushes the value to another array. Since objects use a hash, the Javascript engine can work faster with it.

Secondly it goes through the object's keys to turn it back into an array and finally returns both. I didn't add this explanation because the provided reference already explained it.

The result will be an array containing 2 arrays. First the array with unique values, second the array with duplicates.

what's the most efficient way to split an array of millions of data based on condition?

Question

4 answers

solution1
2 ACCPTED 2022-08-11 08:59:46

solution2
0 2022-08-11 08:35:02

solution3
0 2022-08-11 09:04:24

solution4
-1 2022-08-11 08:59:01

what's the most efficient way to split an array of millions of data based on condition?

Question

4 answers

solution1 2 ACCPTED 2022-08-11 08:59:46

solution2 0 2022-08-11 08:35:02

solution3 0 2022-08-11 09:04:24

solution4 -1 2022-08-11 08:59:01

solution1
2 ACCPTED 2022-08-11 08:59:46

solution2
0 2022-08-11 08:35:02

solution3
0 2022-08-11 09:04:24

solution4
-1 2022-08-11 08:59:01