简体   繁体   中英

How can I perform an inner join with two object arrays in JavaScript?

I have two object arrays:

var a = [
  {id: 4, name: 'Greg'},
  {id: 1, name: 'David'},
  {id: 2, name: 'John'},
  {id: 3, name: 'Matt'},
]

var b = [
  {id: 5, name: 'Mathew', position: '1'},
  {id: 6, name: 'Gracia', position: '2'},
  {id: 2, name: 'John', position: '2'},
  {id: 3, name: 'Matt', position: '2'},
]

I want to do an inner join for these two arrays a and b , and create a third array like this (if the position property is not present, then it becomes null):

var result = [{
  {id: 4, name: 'Greg', position: null},
  {id: 1, name: 'David', position: null},
  {id: 5, name: 'Mathew', position: '1'},
  {id: 6, name: 'Gracia', position: '2'},
  {id: 2, name: 'John', position: '2'},
  {id: 3, name: 'Matt', position: '2'},
}]

My approach:

function innerJoinAB(a,b) {
    a.forEach(function(obj, index) {
        // Search through objects in first loop
        b.forEach(function(obj2,i2){
        // Find objects in 2nd loop
        // if obj1 is present in obj2 then push to result.
        });
    });
}

But the time complexity is O(N^2) . How can I do it in O(N) ? My friend told me that we can use reducers and Object.assign .

I'm not able to figure this out. Please help.

I don't know how reduce would help here, but you could use a Map to accomplish the same task in O(n) :

 const a = [ {id: 4, name: 'Greg'}, {id: 1, name: 'David'}, {id: 2, name: 'John'}, {id: 3, name: 'Matt'}]; const b = [ {id: 5, name: 'Mathew', position: '1'}, {id: 6, name: 'Gracia', position: '2'}, {id: 2, name: 'John', position: '2'}, {id: 3, name: 'Matt', position: '2'}]; var m = new Map(); // Insert all entries keyed by ID into the Map, filling in placeholder // 'position' since the Array 'a' lacks 'position' entirely: a.forEach(function(x) { x.position = null; m.set(x.id, x); }); // For values in 'b', insert them if missing, otherwise, update existing values: b.forEach(function(x) { var existing = m.get(x.id); if (existing === undefined) m.set(x.id, x); else Object.assign(existing, x); }); // Extract resulting combined objects from the Map as an Array var result = Array.from(m.values()); console.log(JSON.stringify(result));
 .as-console-wrapper { max-height: 100% !important; top: 0; }

Because Map accesses and updates are O(1) (on average - because of hash collisions and rehashing, it can be longer), this makes O(n+m) (where n and m are the lengths of a and b respectively; the naive solution you gave would be O(n*m) using the same meaning for n and m ).

One of the ways how to solve it.

 const a = [ {id: 4, name: 'Greg'}, {id: 1, name: 'David'}, {id: 2, name: 'John'}, {id: 3, name: 'Matt'}, ]; const b = [ {id: 5, name: 'Mathew', position: '1'}, {id: 6, name: 'Gracia', position: '2'}, {id: 2, name: 'John', position: '2'}, {id: 3, name: 'Matt', position: '2'}, ]; const r = a.filter(({ id: idv }) => b.every(({ id: idc }) => idv !== idc)); const newArr = b.concat(r).map((v) => v.position ? v : { ...v, position: null }); console.log(JSON.stringify(newArr));
 .as-console-wrapper { max-height: 100% !important; top: 0; }

To reduce the time complexity, it is inevitable to use more memory.

 var a = [ {id: 4, name: 'Greg'}, {id: 1, name: 'David'}, {id: 2, name: 'John'}, {id: 3, name: 'Matt'}, ] var b = [ {id: 5, name: 'Mathew', position: '1'}, {id: 6, name: 'Gracia', position: '2'}, {id: 2, name: 'John', position: '2'}, {id: 3, name: 'Matt', position: '2'}, ] var s = new Set(); var result = []; b.forEach(function(e) { result.push(Object.assign({}, e)); s.add(e.id); }); a.forEach(function(e) { if (!s.has(e.id)) { var temp = Object.assign({}, e); temp.position = null; result.push(temp); } }); console.log(result);

update

As @Blindman67 mentioned:"You do not reduce the problems complexity by moving a search into the native code." I've consulted the ECMAScript® 2016 Language Specification about the internal procedure of Set.prototype.has() and Map.prototype.get() , unfortunately, it seemed that they both iterate through all the elements they have.

Set.prototype.has ( value )#

The following steps are taken:

    Let S be the this value.
    If Type(S) is not Object, throw a TypeError exception.
    If S does not have a [[SetData]] internal slot, throw a TypeError exception.
    Let entries be the List that is the value of S's [[SetData]] internal slot.
    Repeat for each e that is an element of entries,
        If e is not empty and SameValueZero(e, value) is true, return true.
    Return false. 

http://www.ecma-international.org/ecma-262/7.0/#sec-set.prototype.has

Map.prototype.get ( key )#

The following steps are taken:

    Let M be the this value.
    If Type(M) is not Object, throw a TypeError exception.
    If M does not have a [[MapData]] internal slot, throw a TypeError exception.
    Let entries be the List that is the value of M's [[MapData]] internal slot.
    Repeat for each Record {[[Key]], [[Value]]} p that is an element of entries,
        If p.[[Key]] is not empty and SameValueZero(p.[[Key]], key) is true, return p.[[Value]].
    Return undefined. 

http://www.ecma-international.org/ecma-262/7.0/#sec-map.prototype.get

Perhaps, we can use the Object which can directly access its properties by their names, like the hash table or associative array, for example:

 var a = [ {id: 4, name: 'Greg'}, {id: 1, name: 'David'}, {id: 2, name: 'John'}, {id: 3, name: 'Matt'}, ] var b = [ {id: 5, name: 'Mathew', position: '1'}, {id: 6, name: 'Gracia', position: '2'}, {id: 2, name: 'John', position: '2'}, {id: 3, name: 'Matt', position: '2'}, ] var s = {}; var result = []; b.forEach(function(e) { result.push(Object.assign({}, e)); s[e.id] = true; }); a.forEach(function(e) { if (!s[e.id]) { var temp = Object.assign({}, e); temp.position = null; result.push(temp); } }); console.log(result);

You do not reduce the problems complexity by moving a search into the native code. The search must still be done.

Also the addition of the need to null a undefined property is one of the many reasons I dislike using null.

So without the null the solution would look like

var a = [
  {id: 4, name: 'Greg',position: '7'},
  {id: 1, name: 'David'},
  {id: 2, name: 'John'},
  {id: 3, name: 'Matt'},
]

var b = [
  {id: 5, name: 'Mathew', position: '1'},
  {id: 6, name: 'Gracia', position: '2'},
  {id: 2, name: 'John', position: '2'},
  {id: 3, name: 'Matt', position: '2'},
]


function join (indexName, ...arrays) {
    const map = new Map();
    arrays.forEach((array) => {
        array.forEach((item) => {
            map.set(
                item[indexName],
                Object.assign(item, map.get(item[indexName]))
            );
        })
    })
    return [...map.values()];
}

And is called with

const joinedArray = join("id", a, b);

To join with a default is a little more complex but should prove handy as it can join any number of arrays and automatically set missing properties to a provided default.

Testing for the defaults is done after the join to save a little time.

function join (indexName, defaults, ...arrays) {
    const map = new Map();
    arrays.forEach((array) => {
        array.forEach((item) => {
            map.set(
                item[indexName], 
                Object.assign( 
                    item, 
                    map.get(item[indexName])
                )
            );
        })
    })
    return [...map.values()].map(item => Object.assign({}, defaults, item));

}

To use

const joinedArray = join("id", {position : null}, a, b);

You could add...

    arrays.shift().forEach((item) => {  // first array is a special case.
        map.set(item[indexName], item);
    });

...at the start of the function to save a little time, but I feel it's more elegant without the extra code.

If you drop the null criteria (many in the community are saying using null is bad) then there's a very simple solution

let a = [1, 2, 3];
let b = [2, 3, 4];

a.filter(x => b.includes(x)) 

// [2, 3]

Here is an attempt at a more generic version of a join which accepts N objects and merges them based on a primary id key.

If performance is critical, you are better off using a specific version like the one provided by ShadowRanger which doesn't need to dynamically build a list of all property keys.

This implementation assumes that any missing properties should be set to null and that every object in each input array has the same properties (though properties can differ between arrays)

 var a = [ {id: 4, name: 'Greg'}, {id: 1, name: 'David'}, {id: 2, name: 'John'}, {id: 3, name: 'Matt'}, ]; var b = [ {id: 5, name: 'Mathew', position: '1'}, {id: 600, name: 'Gracia', position: '2'}, {id: 2, name: 'John', position: '2'}, {id: 3, name: 'Matt', position: '2'}, ]; console.log(genericJoin(a, b)); function genericJoin(...input) { //Get all possible keys let template = new Set(); input.forEach(arr => { if (arr.length) { Object.keys(arr[0]).forEach(key => { template.add(key); }); } }); // Merge arrays input = input.reduce((a, b) => a.concat(b)); // Merge items with duplicate ids let result = new Map(); input.forEach(item => { result.set(item.id, Object.assign((result.get(item.id) || {}), item)); }); // Convert the map back to an array of objects // and set any missing properties to null return Array.from(result.values(), item => { template.forEach(key => { item[key] = item[key] || null; }); return item; }); }

Here's a generic O(n*m) solution, where n is the number of records and m is the number of keys. This will only work for valid object keys. You can convert any value to base64 and use that if you need to.

const join = ( keys, ...lists ) =>
    lists.reduce(
        ( res, list ) => {
            list.forEach( ( record ) => {
                let hasNode = keys.reduce(
                    ( idx, key ) => idx && idx[ record[ key ] ],
                    res[ 0 ].tree
                )
                if( hasNode ) {
                    const i = hasNode.i
                    Object.assign( res[ i ].value, record )
                    res[ i ].found++
                } else {
                    let node = keys.reduce( ( idx, key ) => {
                        if( idx[ record[ key ] ] )
                            return idx[ record[ key ] ]
                        else
                            idx[ record[ key ] ] = {}
                        return idx[ record[ key ] ]
                    }, res[ 0 ].tree )
                    node.i = res[ 0 ].i++
                    res[ node.i ] = {
                        found: 1,
                        value: record
                    }
                }
            } )
            return res
        },
        [ { i: 1, tree: {} } ]
         )
         .slice( 1 )
         .filter( node => node.found === lists.length )
         .map( n => n.value )

join( [ 'id', 'name' ], a, b )

This is essentially the same as Blindman67's answer, except that it adds an index object to identify records to join. The records are stored in an array and the index stores the position of the record for the given key set and the number of lists it's been found in.

Each time the same key set is encountered, the node is found in the tree, the element at it's index is updated, and the number of times it's been found is incremented.

finally, the idx object is removed from the array with the slice, any elements that weren't found in each set are removed. This makes it an inner join, you could remove this filter and have a full outer join.

finally each element is mapped to it's value, and you have the merged array.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM