简体   繁体   中英

How to extract all possible matching arrays of objects from one array of objects?

I have an array of objects, eg

var arr = [
    {"a": "x"},
    {"b": "0"},
    {"c": "k"},
    {"a": "nm"},
    {"b": "765"},
    {"ab": "i"},
    {"bc": "x"},
    {"ab": "4"},
    {"abc": "L"}
];

Let's say I am only interested in objects whose keys correspond to var input = ["ab", "bc"] . It means that I want to extract all possible subarrays with result[i].length == 2 in the following way:

var result = [
    [{"ab": "i"}, {"bc": "x"}],
    [{"ab": "4"}, {"bc": "x"}] // or [{"bc": "x"}, {"ab": "4"}]
];

— that is, the order of objects in subarrays is absolutely not important: I am only interested in the fact that each subarray contains two objects — {"ab": ...} and {"bc": ...} .

If I was interested in var input = ["a","a","ab"] , the result should be like this:

var result = [
    [{"a": "x"}, {"a": "nm"}, {"ab": "i"}],
    [{"a": "x"}, {"a": "nm"}, {"ab": "4"}]
];

I cannot find the way to achieve the desired result (assuming that input.length may be much greater than 2 or 3 — even 15–20 may be not enough) without factorial-level amount of computations, which is not physically possible. Is there a way to have some reasonable performance for solving such a problem?
Important note : yes, obviously, for relatively large values of input.length there theoretically may be possible to have very huge numbers of possible combinations, but in practice, result.length will always be reasonably small (maybe 100–200, I even doubt that it could reach 1000...). But for safety, I would want to just set some limit (say, 1000), such that as soon as result.length reaches this limit, the function just returns the current result and stops.

Sort alphabetically arr and input , which is O(nlogn), and if you are able to make that as you build the arrays, you may be benefited.

Let me explain my idea with an example:

var arr = [
    {"a": "x"},
    {"ab": "i"},
    {"ab": "4"},
    {"abc": "L"}
    {"bc": "x"},
];
var input = ["ab", "bc"];

Search for input[0] in arr (linearly or even with binary search to speed it up). Mark the index.

Search for input[1] in arr , but consider only the subarray of arr , from the index marked in the previous step, to the end of it.

If you find all the elements of input , then push that to the results (you can keep a temporary object for that).

Now, we have to search again for input[0] , since it may be that two or more entries share that key. You will have stored that index I mentioned before, so that you will start searching again from this index, and since arr is sorted, you would have to check only the very next element and so on.


Time complextiy:

Sort your arrays (assuming you couldn't have them sorted when you built them): O(nlogn), where n is the number of elements arr has.

Binary search in arr for input[0] : O(logn)

Now the next step of search (for input[1] ) is much less than the length of arr , so a very pessimistic bound would be O(n). In practise it won't be O(n) of course, and if you want you can do a binary search for input[1] too, which would cost O(logm), where m is the size of the arr[index_stored: -1] .

At this point, we move on to to finding the next occurence of input[0] , if any of course, but because we have stored the index we know exactly where to start searching, and we have to check the next element only, that's a constant cost, thus O(1).

And then we do the same for input[1] as above, which is cheap again.

Now, it all depends on the length of input , which is k , and it seems that k < n , and how many occurrences of a key you have, right?

But assuming a normal-avergae situation, the whole procedure has a time complextiy of:

O(nlogn)

However, notice that you have to pay a bit of extra memory to store the indices, which is subject to the number of occurences a key has. With a brute force aglrotihm, which would be slower, you wouldn't need to pay anything extra for memory.

Perhaps not the most optimal way. I'd probably use some library for the final solution, but here is a number of steps that would do the trick for a happy path. I'll add a bit of comments shortly.

Generate a map for a single key in the source array (ie at which indexes it is seen, as we may have multiple entries)

   function getKeyMap( src, key ){
        var idx_arr = [];
        src.forEach(function(pair,idx){ if(Object.keys(pair)[0] === key){ idx_arr.push(idx)} });
        return idx_arr;
    }

And this mapping has to be done for all the keys you want to be part of filtering

function getKeysMap( src, keys ){
    var keys_map = [];
    keys.forEach(function(aKey){
        var aMap = getKeyMap(src,aKey);
        if( aMap.length ){
            keys_map.push(aMap);
        }

    });
    // if keys map lenght is less then keys length then you should throw an exception or something
    return keys_map;
}

Then you want to build all the possible combinations. I use recursion here, in perhaps not the most optimal way

function buildCombos( keys_map, carry, result ){
    if( keys_map.length === 0){
        result.push(carry);
        return;
    }
    var iter = keys_map.pop();
    iter.forEach(function(key){
        var cloneMap = keys_map.slice(0);
        var clone = carry.slice(0);
        clone.push(key);
        buildCombos(cloneMap, clone, result);
    });
}

Then I need to filter the result to exclude double entries, and entries with the repeating indices

function uniqueFilter(value, index, self) {
    return self.indexOf(value) === index;
}

function filterResult( map ){
    var filter = {};
    map.forEach(function(item){
        var unique = item.filter( uniqueFilter );
        if(unique.length === item.length){
            filter[unique.sort().join('')]=true;}
        });
    return filter;
}

And then I simply decode the resulting filtered map into the original data

function decodeMap( map,src ){
    var result = [];
    Object.keys(map).forEach(function(item){
        var keys = item.split('');
        var obj = [];
        keys.forEach(function( j ){
            obj.push( src[j])
        });
        result.push(obj);
    });
    return result;
}

The wrapper

function doItAll(arr, keys){
    // Get map of they keys in terms of numbers
    var maps = getKeysMap( arr, keys);
    // build combinations out of key map
    var combos = [];
    buildCombos(maps,[],combos);
    // filter results to get rid of same sequences and same indexes ina sequence
    var map = filterResult(combos);
    // decode map into the source array
    return decodeMap( map, arr )
}

Usage:

var res = doItAll(arr, ["a","a","ab"])

Seeing the problem, it kind of look like a cartessian product. In fact, if before operating, the data model is modified a bit, the expected result is, in almost all cases, a cartessian product. However, there's a case (the second example you provided) that needs special treatment. Here's what I did:

  1. Tweak a bit the model data (this will be done only once) in order to have something suitable to apply cartessian product.
  2. Treat the "special case" of having more than one parameter requesting the same string.

All the important logic is within cartessianProdModified . The important bits in the code are commented. Hope it helps you with your problem or at least gives you some ideas.

Here's a fiddle and here's the code:

var arr = [
    {"a": "x"},
    {"b": "0"},
    {"c": "k"},
    {"a": "nm"},
    {"b": "765"},
    {"ab": "i"},
    {"bc": "x"},
    {"ab": "4"},
    {"abc": "L"},
    {"dummy": "asdf"}
];

// Utility function to be used in the cartessian product
function flatten(arr) {
    return arr.reduce(function (memo, el) {
        return memo.concat(el);
    }, []);
}

// Utility function to be used in the cartessian product
function unique(arr) {
    return Object.keys(arr.reduce(function (memo, el) {
        return (memo[el] = 1) && memo;
    }, {}));
}

// It'll prepare the output in the expected way
function getObjArr(key, val, processedObj) {
    var set = function (key, val, obj) {
        return (obj[key] = val) && obj;
    };
    // The cartessian product is over so we can put the 'special case' in an object form so that we can get the expected output.
    return val !== 'repeated' ? [set(key, val, {})] : processedObj[key].reduce(function (memo, val) {
        return memo.concat(set(key, val, {}));
    }, []);
}

// This is the main function. It'll make the cartessian product.
var cartessianProdModified = (function (arr) {
    // Tweak the data model in order to have a set (key: array of values)
    var processedObj = arr.reduce(function (memo, obj) {
        var firstKey = Object.keys(obj)[0];
        return (memo[firstKey] = (memo[firstKey] || []).concat(obj[firstKey])) && memo;
    }, {});

    // Return a function that will perform the cartessian product of the args.
    return function (args) {
        // Spot repeated args.
        var countArgs = args.reduce(function (memo, el) {
                return (memo[el] = (memo[el] || 0) + 1) && memo;
            }, {}),
            // Remove repeated args so that the cartessian product works properly and more efficiently.
            uniqArgs = unique(args);

        return uniqArgs
                .reduce(function (memo, el) {
                    return flatten(memo.map(function (x) {
                        // Special case: the arg is repeated: we have to treat as a unique value in order to do the cartessian product properly
                        return (countArgs[el] > 1 ? ['repeated'] : processedObj[el]).map(function (y) {
                            return x.concat(getObjArr(el, y, processedObj));
                        });
                    }));
                }, [[]]);
    };
})(arr);

console.log(cartessianProdModified(['a', 'a', 'ab']));

If you are able to use ES6 features, you can use generators to avoid having to create large intermediate arrays. It would seem that you want a set-of-sets of sorts, with rows containing only unique values. As others have also mentioned, you can approach this by starting with a cartesian product of objects matching your input keys:

'use strict';

function* product(...seqs) {
    const indices = seqs.map(() => 0),
          lengths = seqs.map(seq => seq.length);

    // A product of 0 is empty
    if (lengths.indexOf(0) != -1) {
        return;
    }

    while (true) {
        yield indices.map((i, iseq) => seqs[iseq][i]);
        // Update indices right-to-left
        let i;
        for (i = indices.length - 1; i >= 0; i--) {
            indices[i]++;
            if (indices[i] == lengths[i]) {
                // roll-over
                indices[i] = 0;
            } else {
                break;
            }
        }
        // If i is negative, then all indices have rolled-over
        if (i < 0) {
            break;
        }
    }
}

The generator only holds the indices in between iterations and generates new rows on demand. To actually join the objects that match your input keys, you first have to for example create a lookup:

function join(keys, values) {
    const lookup = [...new Set(keys)].reduce((o, k) => {
        o[k] = [];
        return o;
    }, {});

    // Iterate over array indices instead of objects them selves.
    // This makes producing unique rows later on a *lot* easier.
    for (let i of values.keys()) {
       const k = Object.keys(values[i])[0];
       if (lookup.hasOwnProperty(k)) {
           lookup[k].push(i);
       } 
    }

    return product(...keys.map(k => lookup[k]));
}

You then need to filter out rows containing duplicate values:

function isUniq(it, seen) {
    const notHadIt = !seen.has(it);
    if (notHadIt) {
        seen.add(it);
    }
    return notHadIt;
}

function* removeDups(iterable) {
    const seen = new Set();
    skip: for (let it of iterable) {
        seen.clear();
        for (let x of it) {
            if (!isUniq(x, seen)) {
                continue skip;
            }
        }
        yield it;
    }
}

And also globally unique rows (the set-of-sets aspect):

function* distinct(iterable) {
    const seen = new Set();
    for (let it of iterable) {
        // Bit of a hack here, produce a known order for each row so
        // that we can produce a "set of sets" as output. Rows are
        // arrays of integers.
        const k = it.sort().join();
        if (isUniq(k, seen)) {
            yield it;
        }
    }
}

To tie it all up:

function* query(input, arr) {
    for (let it of distinct(removeDups(join(input, arr)))) {
        // Objects from rows of indices
        yield it.map(i => arr[i]);
    }
}

function getResults(input, arr) {
    return Array.from(query(input, arr));
}

In action:

const arr = [
    {"a": "x"},
    {"b": "0"},
    {"c": "k"},
    {"a": "nm"},
    {"b": "765"},
    {"ab": "i"},
    {"bc": "x"},
    {"ab": "4"},
    {"abc": "L"}
];

console.log(getResults(["a", "a", "ab"], arr));
/*
[ [ { a: 'x' }, { a: 'nm' }, { ab: 'i' } ],
  [ { a: 'x' }, { a: 'nm' }, { ab: '4' } ] ]
*/

And the obligatory jsFiddle .

You can do it manually with loops, but you can also use the built-in functions Array.prototype.filter() to filter the array and Array.prototype.indexOf to check if an element is inside another array:

var filtered = arr.filter(function(pair){
    return input.indexOf(Object.keys(pair)[0]) != -1;
});

This gives you array with just the objects that match your criteria.

Now the thing with the result array in math language is called "combinations". This is exactly what you want, so I won't describe it here. A way to generate all combinations of array (set) is given here - https://stackoverflow.com/a/18250883/3132718

So here is how to use this function:

// function assumes each element is array, so we need to wrap each one in an array 
for(var i in filtered) {
    filtered[i] = [filtered[i]];
}
var result = getCombinations(filtered, input.length /* how many elements in each sub-array (subset) */);

Object.keys(pair)[0] is a way to get the first key of an object without iterating ( https://stackoverflow.com/a/28670472 )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM