简体   繁体   中英

JavaScript: Removing duplicates in an array of arrays

Currently using JavaScript and I need to go through an array of arrays to determine if there are any duplicate arrays, and then deleting those duplicated arrays. Runtime is of the essence in this case, so I was wondering what the most EFFICIENT way of doing this is.

Is using a hash table desirable in this case? The scope of this would be to hash each sequence and then use the hash to determine whether that sequence occurs again. Hence, each sequence is an array within the master array, and any duplicates would be other arrays within the same array. Furthermore, it is extremely important that all individual arrays remain ordered themselves (ie the elements in the individual arrays must always keep their position). Also, all elements in the individual array are string values.

Example: Assume that there is an array A whose elements are in turn the following arrays:

A[0] = ["one", "two", "three", "four"]
A[1] = ["two", "one", "three", "four"]
A[2] = ["one", "two", "three", "four"]

In the above example, A[0] and A[2] are duplicates and so the function should return A[0] and A[1], such that there is only one instance of the same array.

Keep an object where the keys are the joined elements of each array. If the key is not found add the array to the output array and add the key to the object.

var hash = {};
var out = [];
for (var i = 0, l = A.length; i < l; i++) {
  var key = A[i].join('|');
  if (!hash[key]) {
    out.push(A[i]);
    hash[key] = 'found';
  }
}

DEMO

Ok let us first have a look at the complexity of the naive solution: If there are n arrays, each with at most k entries, you need O(n^2 * k) comparisons, because for each of these n arrays, you have to compare it to n-1 others with k comparisons each. The space complexity is O(n*k)

So if you are willing to trade space for better performance, you can do the following: (Short disclaimer: I assume all your arrays have an equal number of k elements which is indicated but not approved by your question.)

Going one by one through the arrays, you pick the first element which we assume is a . Use a hash map to verify whether you saw this element as a first element before. If not, create a tree structure with a as its root, store it under a in your hash map and make it your current node. Now, for each subsequent entry in the current array, you check whether your current node has a child of that kind. So if the second entry is b , you add b to be a child of a.

Your tree now looks like that: (left to right: root to children)

a - b

Having c as the third entry works exactly the same:

a - b - c

Now we skip forward to have a look on an array [a, c, d] . You first encounter the tree for element a . For the second element, you check whether c is already a child of a. If not, add it:

  - b - c
a
  - c

same goes for the next entry:

  - b - c
a
  - c - d

Let us now see what happens when we check an array that we saw before: [a, b, c]

First we check a , see that there is already a tree and get it from the hash map. Next, we notice that a has a child named b , so we descend to b . Now, for the last entry, we see that it is already there too, telling us that we encountered a duplicate which we can drop.

Sorry for the improvised drawing, I hope I can get the idea across. It is just about going through each array only once, storing it in a non-redundant way. So the time complexity would be O(n*k) . The used space increases but is bounded by O(n*k) since the worst case is no array shared any prefix, which results in the same space complexity.

Hope I didn't overlook something.

ONELINER

A.filter((r={},a=>!(r[a]=++r[a]|0)))

I assume that your strings not contains , character. If contains then change twice r[a] to r[a.join('|')] (where | is arbitrary separator) or use r[a.map(x=>x.length+','+x)] to allow all characters in your strings. Here is working example .

Explanation

In r={} we set once temporary object. In filter function a=>... and is only for declare once empty temporary object in argument r={} . In function a=>... in a we have current A element . The JS make implicit cast a to string in r[a] . Then in !(r[a]=++r[a]|0) we increase counter of occurrence element a and return true (as filter function value) if element a appear first time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM