I have an array of objects, eg
var arr = [
{"a": "x"},
{"b": "0"},
{"c": "k"},
{"a": "nm"},
{"b": "765"},
{"ab": "i"},
{"bc": "x"},
{"ab": "4"},
{"abc": "L"}
];
Let's say I am only interested in objects whose keys correspond to var input = ["ab", "bc"]
. It means that I want to extract all possible subarrays with result[i].length == 2
in the following way:
var result = [
[{"ab": "i"}, {"bc": "x"}],
[{"ab": "4"}, {"bc": "x"}] // or [{"bc": "x"}, {"ab": "4"}]
];
— that is, the order of objects in subarrays is absolutely not important: I am only interested in the fact that each subarray contains two objects — {"ab": ...}
and {"bc": ...}
.
If I was interested in var input = ["a","a","ab"]
, the result should be like this:
var result = [
[{"a": "x"}, {"a": "nm"}, {"ab": "i"}],
[{"a": "x"}, {"a": "nm"}, {"ab": "4"}]
];
I cannot find the way to achieve the desired result (assuming that input.length
may be much greater than 2 or 3 — even 15–20 may be not enough) without factorial-level amount of computations, which is not physically possible. Is there a way to have some reasonable performance for solving such a problem?
Important note : yes, obviously, for relatively large values of input.length
there theoretically may be possible to have very huge numbers of possible combinations, but in practice, result.length
will always be reasonably small (maybe 100–200, I even doubt that it could reach 1000...). But for safety, I would want to just set some limit (say, 1000), such that as soon as result.length
reaches this limit, the function just returns the current result
and stops.
Sort alphabetically arr
and input
, which is O(nlogn), and if you are able to make that as you build the arrays, you may be benefited.
Let me explain my idea with an example:
var arr = [
{"a": "x"},
{"ab": "i"},
{"ab": "4"},
{"abc": "L"}
{"bc": "x"},
];
var input = ["ab", "bc"];
Search for input[0]
in arr
(linearly or even with binary search to speed it up). Mark the index.
Search for input[1]
in arr
, but consider only the subarray of arr
, from the index marked in the previous step, to the end of it.
If you find all the elements of input
, then push that to the results
(you can keep a temporary object for that).
Now, we have to search again for input[0]
, since it may be that two or more entries share that key. You will have stored that index I mentioned before, so that you will start searching again from this index, and since arr
is sorted, you would have to check only the very next element and so on.
Time complextiy:
Sort your arrays (assuming you couldn't have them sorted when you built them): O(nlogn), where n
is the number of elements arr
has.
Binary search in arr
for input[0]
: O(logn)
Now the next step of search (for input[1]
) is much less than the length of arr
, so a very pessimistic bound would be O(n). In practise it won't be O(n) of course, and if you want you can do a binary search for input[1]
too, which would cost O(logm), where m
is the size of the arr[index_stored: -1]
.
At this point, we move on to to finding the next occurence of input[0]
, if any of course, but because we have stored the index we know exactly where to start searching, and we have to check the next element only, that's a constant cost, thus O(1).
And then we do the same for input[1]
as above, which is cheap again.
Now, it all depends on the length of input
, which is k
, and it seems that k < n
, and how many occurrences of a key you have, right?
But assuming a normal-avergae situation, the whole procedure has a time complextiy of:
O(nlogn)
However, notice that you have to pay a bit of extra memory to store the indices, which is subject to the number of occurences a key has. With a brute force aglrotihm, which would be slower, you wouldn't need to pay anything extra for memory.
Perhaps not the most optimal way. I'd probably use some library for the final solution, but here is a number of steps that would do the trick for a happy path. I'll add a bit of comments shortly.
Generate a map for a single key in the source array (ie at which indexes it is seen, as we may have multiple entries)
function getKeyMap( src, key ){
var idx_arr = [];
src.forEach(function(pair,idx){ if(Object.keys(pair)[0] === key){ idx_arr.push(idx)} });
return idx_arr;
}
And this mapping has to be done for all the keys you want to be part of filtering
function getKeysMap( src, keys ){
var keys_map = [];
keys.forEach(function(aKey){
var aMap = getKeyMap(src,aKey);
if( aMap.length ){
keys_map.push(aMap);
}
});
// if keys map lenght is less then keys length then you should throw an exception or something
return keys_map;
}
Then you want to build all the possible combinations. I use recursion here, in perhaps not the most optimal way
function buildCombos( keys_map, carry, result ){
if( keys_map.length === 0){
result.push(carry);
return;
}
var iter = keys_map.pop();
iter.forEach(function(key){
var cloneMap = keys_map.slice(0);
var clone = carry.slice(0);
clone.push(key);
buildCombos(cloneMap, clone, result);
});
}
Then I need to filter the result to exclude double entries, and entries with the repeating indices
function uniqueFilter(value, index, self) {
return self.indexOf(value) === index;
}
function filterResult( map ){
var filter = {};
map.forEach(function(item){
var unique = item.filter( uniqueFilter );
if(unique.length === item.length){
filter[unique.sort().join('')]=true;}
});
return filter;
}
And then I simply decode the resulting filtered map into the original data
function decodeMap( map,src ){
var result = [];
Object.keys(map).forEach(function(item){
var keys = item.split('');
var obj = [];
keys.forEach(function( j ){
obj.push( src[j])
});
result.push(obj);
});
return result;
}
The wrapper
function doItAll(arr, keys){
// Get map of they keys in terms of numbers
var maps = getKeysMap( arr, keys);
// build combinations out of key map
var combos = [];
buildCombos(maps,[],combos);
// filter results to get rid of same sequences and same indexes ina sequence
var map = filterResult(combos);
// decode map into the source array
return decodeMap( map, arr )
}
Usage:
var res = doItAll(arr, ["a","a","ab"])
Seeing the problem, it kind of look like a cartessian product. In fact, if before operating, the data model is modified a bit, the expected result is, in almost all cases, a cartessian product. However, there's a case (the second example you provided) that needs special treatment. Here's what I did:
All the important logic is within cartessianProdModified
. The important bits in the code are commented. Hope it helps you with your problem or at least gives you some ideas.
Here's a fiddle and here's the code:
var arr = [
{"a": "x"},
{"b": "0"},
{"c": "k"},
{"a": "nm"},
{"b": "765"},
{"ab": "i"},
{"bc": "x"},
{"ab": "4"},
{"abc": "L"},
{"dummy": "asdf"}
];
// Utility function to be used in the cartessian product
function flatten(arr) {
return arr.reduce(function (memo, el) {
return memo.concat(el);
}, []);
}
// Utility function to be used in the cartessian product
function unique(arr) {
return Object.keys(arr.reduce(function (memo, el) {
return (memo[el] = 1) && memo;
}, {}));
}
// It'll prepare the output in the expected way
function getObjArr(key, val, processedObj) {
var set = function (key, val, obj) {
return (obj[key] = val) && obj;
};
// The cartessian product is over so we can put the 'special case' in an object form so that we can get the expected output.
return val !== 'repeated' ? [set(key, val, {})] : processedObj[key].reduce(function (memo, val) {
return memo.concat(set(key, val, {}));
}, []);
}
// This is the main function. It'll make the cartessian product.
var cartessianProdModified = (function (arr) {
// Tweak the data model in order to have a set (key: array of values)
var processedObj = arr.reduce(function (memo, obj) {
var firstKey = Object.keys(obj)[0];
return (memo[firstKey] = (memo[firstKey] || []).concat(obj[firstKey])) && memo;
}, {});
// Return a function that will perform the cartessian product of the args.
return function (args) {
// Spot repeated args.
var countArgs = args.reduce(function (memo, el) {
return (memo[el] = (memo[el] || 0) + 1) && memo;
}, {}),
// Remove repeated args so that the cartessian product works properly and more efficiently.
uniqArgs = unique(args);
return uniqArgs
.reduce(function (memo, el) {
return flatten(memo.map(function (x) {
// Special case: the arg is repeated: we have to treat as a unique value in order to do the cartessian product properly
return (countArgs[el] > 1 ? ['repeated'] : processedObj[el]).map(function (y) {
return x.concat(getObjArr(el, y, processedObj));
});
}));
}, [[]]);
};
})(arr);
console.log(cartessianProdModified(['a', 'a', 'ab']));
If you are able to use ES6 features, you can use generators to avoid having to create large intermediate arrays. It would seem that you want a set-of-sets of sorts, with rows containing only unique values. As others have also mentioned, you can approach this by starting with a cartesian product of objects matching your input
keys:
'use strict';
function* product(...seqs) {
const indices = seqs.map(() => 0),
lengths = seqs.map(seq => seq.length);
// A product of 0 is empty
if (lengths.indexOf(0) != -1) {
return;
}
while (true) {
yield indices.map((i, iseq) => seqs[iseq][i]);
// Update indices right-to-left
let i;
for (i = indices.length - 1; i >= 0; i--) {
indices[i]++;
if (indices[i] == lengths[i]) {
// roll-over
indices[i] = 0;
} else {
break;
}
}
// If i is negative, then all indices have rolled-over
if (i < 0) {
break;
}
}
}
The generator only holds the indices in between iterations and generates new rows on demand. To actually join the objects that match your input
keys, you first have to for example create a lookup:
function join(keys, values) {
const lookup = [...new Set(keys)].reduce((o, k) => {
o[k] = [];
return o;
}, {});
// Iterate over array indices instead of objects them selves.
// This makes producing unique rows later on a *lot* easier.
for (let i of values.keys()) {
const k = Object.keys(values[i])[0];
if (lookup.hasOwnProperty(k)) {
lookup[k].push(i);
}
}
return product(...keys.map(k => lookup[k]));
}
You then need to filter out rows containing duplicate values:
function isUniq(it, seen) {
const notHadIt = !seen.has(it);
if (notHadIt) {
seen.add(it);
}
return notHadIt;
}
function* removeDups(iterable) {
const seen = new Set();
skip: for (let it of iterable) {
seen.clear();
for (let x of it) {
if (!isUniq(x, seen)) {
continue skip;
}
}
yield it;
}
}
And also globally unique rows (the set-of-sets aspect):
function* distinct(iterable) {
const seen = new Set();
for (let it of iterable) {
// Bit of a hack here, produce a known order for each row so
// that we can produce a "set of sets" as output. Rows are
// arrays of integers.
const k = it.sort().join();
if (isUniq(k, seen)) {
yield it;
}
}
}
To tie it all up:
function* query(input, arr) {
for (let it of distinct(removeDups(join(input, arr)))) {
// Objects from rows of indices
yield it.map(i => arr[i]);
}
}
function getResults(input, arr) {
return Array.from(query(input, arr));
}
In action:
const arr = [
{"a": "x"},
{"b": "0"},
{"c": "k"},
{"a": "nm"},
{"b": "765"},
{"ab": "i"},
{"bc": "x"},
{"ab": "4"},
{"abc": "L"}
];
console.log(getResults(["a", "a", "ab"], arr));
/*
[ [ { a: 'x' }, { a: 'nm' }, { ab: 'i' } ],
[ { a: 'x' }, { a: 'nm' }, { ab: '4' } ] ]
*/
And the obligatory jsFiddle .
You can do it manually with loops, but you can also use the built-in functions Array.prototype.filter() to filter the array and Array.prototype.indexOf to check if an element is inside another array:
var filtered = arr.filter(function(pair){
return input.indexOf(Object.keys(pair)[0]) != -1;
});
This gives you array with just the objects that match your criteria.
Now the thing with the result
array in math language is called "combinations". This is exactly what you want, so I won't describe it here. A way to generate all combinations of array (set) is given here - https://stackoverflow.com/a/18250883/3132718
So here is how to use this function:
// function assumes each element is array, so we need to wrap each one in an array
for(var i in filtered) {
filtered[i] = [filtered[i]];
}
var result = getCombinations(filtered, input.length /* how many elements in each sub-array (subset) */);
Object.keys(pair)[0]
is a way to get the first key of an object without iterating ( https://stackoverflow.com/a/28670472 )
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.