简体   繁体   中英

Get the intersection of n arrays

Using ES6's Set , given two arrays we can get the intersection like so:

let a = new Set([1,2,3])
let b = new Set([1,2,4])
let intersect = new Set([...a].filter(i => b.has(i)));

How can we get the intersection of n arrays?

Update:

I'm trying to wrap my head around this for the following use case. I have a two dimensional array with at least one element.

parts.forEach(part => {
  intersection = new Set()
})

How would you get the intersection of each element (array) in parts ?

Assuming you have some function function intersect(set1, set2) {...} that can intersect two sets, you can get the intersection of an array of sets using reduce :

function intersect(a, b) {
    return new Set(a.filter(i => b.has(i)));
}

var sets = [new Set([1,2,3]), ...];
var intersection = sets.reduce(intersect);

You can create an intersect helper function using a combination of Array methods like .filter() , .map() , and .every() .

This answer is inspired by the comment above from Xufox , who mentioned using Array#every in a filter predicate.

 function intersect (first = [], ...rest) { rest = rest.map(array => new Set(array)) return first.filter(e => rest.every(set => set.has(e))) } let parts = [ [1, 2, 3], [1, 2, 4], [1, 5, 2] ] console.log( intersect(...parts) ) 

ES6 still has a while

This is the type of function that can easily cause long lags due to excessive amounts of processing. This is more true with the unquestioning and even preferential use of ES6 and array methods like reduce, filter etc, over simple old fashioned loops like while and for.

When calculating the intersection of many sets the amount of work done per iteration should go down if an item has been found not to be part of the intersection. Because forEach can not break you are forced to still iterate all elements. Adding some code to avoid doing the search if the current item has been found to not belong can improve the performance, but it is a real kludge.

The is also the tendency to just create whole new datasets just to remove a single item from an array, set, or map. This is a very bad habit that i see more and more of as people adopt the ES5 way.

Get the intersection of n sets.

So to the problem at hand. Find the intersection of many sets.

Solution B

A typical ES6 solution

 function intersectB(firstSet, ...sets) { // function to intercept two sets var intersect = (a,b) => { return new Set([...a].filter(item => b.has(item))) }; // iterate all sets comparing the first set to each. sets.forEach(sItem => firstSet = intersect(firstSet, sItem)); // return the result. return firstSet; } var sets = [new Set([1,2,3,4]), new Set([1,2,4,6,8]), new Set([1,3,4,6,8])]; var inter = intersectB(...sets); console.log([...inter]); 

Works well and for the simple test case execution time is under a millisecond. But in my book it is a memory hogging knot of inefficiency, creating arrays, and sets at every line almost and iterating whole sets when the outcome is already known.

Let's give it some more work. 100 sets, with up to 10000 items over 10 tests each with differing amount of matching items. Most of the intercepts will return empty sets.

Warning will cause page to hang up to one whole second... :(

 // Create a set of numbers from 0 and < count // With a odds for any number occurring to be odds // return as a new set; function createLargeSet(count,odds){ var numbers = new Set(); while(count-- > 0){ if(Math.random() < odds){ numbers.add(count); } } return numbers; } // create a array of large sets function bigArrayOfSets(setCount,setMaxSize,odds){ var bigSets = []; for(var i = 0; i < setCount; i ++){ bigSets.push(createLargeSet(setMaxSize,odds)); } return bigSets; } function intersectB(firstSet, ...sets) { var intersect = (a,b) => { return new Set([...a].filter(item => b.has(item))) }; sets.forEach(sItem => firstSet = intersect(firstSet, sItem)); return firstSet; } var testSets = []; for(var i = 0.1; i <= 1; i += 0.1){ testSets.push(bigArrayOfSets(100,10000,i)); } var now = performance.now(); testSets.forEach(testDat => intersectB(...testDat)); var time = performance.now() - now; console.log("Execution time : " + time); 

Solution A

A better way, not as fancy but much more efficient.

function intersectA(firstSet,...sets) {
    var count = sets.length;
    var result = new Set(firstSet); // Only create one copy of the set
    firstSet.forEach(item => {
        var i = count;
        var allHave = true;
        while(i--){
            allHave = sets[i].has(item)
            if(!allHave) { break }  // loop only until item fails test
        }
        if(!allHave){
            result.delete(item);  // remove item from set rather than
                                  // create a whole new set
        }
    })
    return result;
}

Compare

So now let's compare both, if you are feeling lucky try and guess the performance difference, it's a good way to gage your understanding of Javascript execution.

 // Create a set of numbers from 0 and < count // With a odds for any number occurring to be odds // return as a new set; function createLargeSet(count,odds){ var numbers = new Set(); while(count-- > 0){ if(Math.random() < odds){ numbers.add(count); } } return numbers; } // create a array of large sets function bigArrayOfSets(setCount,setMaxSize,odds){ var bigSets = []; for(var i = 0; i < setCount; i ++){ bigSets.push(createLargeSet(setMaxSize,odds)); } return bigSets; } function intersectA(firstSet,...sets) { var count = sets.length; var result = new Set(firstSet); // Only create one copy of the set firstSet.forEach(item => { var i = count; var allHave = true; while(i--){ allHave = sets[i].has(item) if(!allHave) { break } // loop only until item fails test } if(!allHave){ result.delete(item); // remove item from set rather than // create a whole new set } }) return result; } function intersectB(firstSet, ...sets) { var intersect = (a,b) => { return new Set([...a].filter(item => b.has(item))) }; sets.forEach(sItem => firstSet = intersect(firstSet, sItem)); return firstSet; } var testSets = []; for(var i = 0.1; i <= 1; i += 0.1){ testSets.push(bigArrayOfSets(100,10000,i)); } var now = performance.now(); testSets.forEach(testDat => intersectB(...testDat)); var time = performance.now() - now; console.log("Execution time 'intersectB' : " + time); var now = performance.now(); testSets.forEach(testDat => intersectA(...testDat)); var time = performance.now() - now; console.log("Execution time 'intersectA' : " + time); 

As you can see using a simple while loop may not be a cool as using filter but the performance benefit is huge, and something to keep in mind next time you are writing that perfect 3 line ES6 array manipulation function. Dont forget about for and while .

OK i guess the most efficient way of performing the Array intersection is by utilizing a Map or Hash object. Here I test 1000 arrays each with ~1000 random integer items among 1..175 for an intersection. The result is obtained in less than 100msec.

 function setIntersection(a){ var m = new Map(), r = new Set(), l = a.length; a.forEach(sa => new Set(sa).forEach(n => m.has(n) ? m.set(n,m.get(n)+1) : m.set(n,1))); m.forEach((v,k) => v === l && r.add(k)); return r; } var testSets = Array(1000).fill().map(_ => Array(1000).fill().map(_ => ~~(Math.random()*175+1))); console.time("int"); result = setIntersection(testSets); console.timeEnd("int"); console.log(JSON.stringify([...result])); 

The most efficient algorithm for intersecting n arrays is the one implemented in fast_array_intersect . It runs in O(n) , where n is the total number of elements in all the arrays.

The base principle is simple: iterate over all the arrays, storing the number of times you see each element in a map. Then filter the smallest array, to return only the elements that have been seen in all the arrays. (source code) .

You can use the library with a simple :

import intersect from 'fast_array_intersect'

intersect([[1,2,3], [1,2,6]]) // --> [1,2]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM