简体   繁体   English

JavaScript:删除数组数组中的重复项

[英]JavaScript: Removing duplicates in an array of arrays

Currently using JavaScript and I need to go through an array of arrays to determine if there are any duplicate arrays, and then deleting those duplicated arrays. 目前使用JavaScript,我需要通过一个数组数组来确定是否有任何重复的数组,然后删除这些重复的数组。 Runtime is of the essence in this case, so I was wondering what the most EFFICIENT way of doing this is. 在这种情况下,运行时是至关重要的,所以我想知道最有效的方法是什么。

Is using a hash table desirable in this case? 在这种情况下使用哈希表是否合适? The scope of this would be to hash each sequence and then use the hash to determine whether that sequence occurs again. 这样做的范围是散列每个序列,然后使用散列来确定该序列是否再次出现。 Hence, each sequence is an array within the master array, and any duplicates would be other arrays within the same array. 因此,每个序列是主阵列中的一个阵列,任何重复序列都是同一阵列中的其他阵列。 Furthermore, it is extremely important that all individual arrays remain ordered themselves (ie the elements in the individual arrays must always keep their position). 此外,所有单个阵列本身保持有序非常重要(即各个阵列中的元素必须始终保持其位置)。 Also, all elements in the individual array are string values. 此外,单个数组中的所有元素都是字符串值。

Example: Assume that there is an array A whose elements are in turn the following arrays: 示例:假设有一个数组A,其元素依次为以下数组:

A[0] = ["one", "two", "three", "four"]
A[1] = ["two", "one", "three", "four"]
A[2] = ["one", "two", "three", "four"]

In the above example, A[0] and A[2] are duplicates and so the function should return A[0] and A[1], such that there is only one instance of the same array. 在上面的例子中,A [0]和A [2]是重复的,因此函数应该返回A [0]和A [1],这样只有一个相同数组的实例。

Keep an object where the keys are the joined elements of each array. 保持一个对象,其中键是每个数组的连接元素。 If the key is not found add the array to the output array and add the key to the object. 如果未找到密钥,请将数组添加到输出数组并将密钥添加到对象。

var hash = {};
var out = [];
for (var i = 0, l = A.length; i < l; i++) {
  var key = A[i].join('|');
  if (!hash[key]) {
    out.push(A[i]);
    hash[key] = 'found';
  }
}

DEMO DEMO

Ok let us first have a look at the complexity of the naive solution: If there are n arrays, each with at most k entries, you need O(n^2 * k) comparisons, because for each of these n arrays, you have to compare it to n-1 others with k comparisons each. 好吧,让我们先来看看天真解决方案的复杂性:如果有n个数组,每个数组最多有k个条目,则需要进行O(n^2 * k)比较,因为对于这n个数组中的每一个,你都有将它与n-1个进行比较,每个进行k次比较。 The space complexity is O(n*k) 空间复杂度为O(n*k)

So if you are willing to trade space for better performance, you can do the following: (Short disclaimer: I assume all your arrays have an equal number of k elements which is indicated but not approved by your question.) 因此,如果您愿意交换空间以获得更好的性能,您可以执行以下操作:(简短免责声明:我假设您的所有阵列都有相同数量的k元素,这些元素已指明但未经您的问题批准。)

Going one by one through the arrays, you pick the first element which we assume is a . 将一个一个地通过数组,你挑我们假设是第一要素a Use a hash map to verify whether you saw this element as a first element before. 使用哈希映射来验证您是否将此元素视为之前的第一个元素。 If not, create a tree structure with a as its root, store it under a in your hash map and make it your current node. 如果没有,请创建一个以其根为根的树结构, a其存储在哈希映射中的a下,并使其成为当前节点。 Now, for each subsequent entry in the current array, you check whether your current node has a child of that kind. 现在,对于当前数组中的每个后续条目,检查当前节点是否具有该类型的子节点。 So if the second entry is b , you add b to be a child of a. 因此,如果第二个条目是b ,则将b添加为a的子级。

Your tree now looks like that: (left to right: root to children) 你的树现在看起来像这样:(从左到右:root到孩子)

a - b a - b

Having c as the third entry works exactly the same: c作为第三个条目的工作方式完全相同:

a - b - c a - b - c

Now we skip forward to have a look on an array [a, c, d] . 现在我们跳过去查看一个数组[a, c, d] You first encounter the tree for element a . 您首先遇到元素a的树。 For the second element, you check whether c is already a child of a. 对于第二个元素,检查c是否已经是a的子元素。 If not, add it: 如果没有,请添加它:

  - b - c
a
  - c

same goes for the next entry: 同样适用于下一个条目:

  - b - c
a
  - c - d

Let us now see what happens when we check an array that we saw before: [a, b, c] 现在让我们看看当我们检查之前看到的数组时会发生什么: [a, b, c]

First we check a , see that there is already a tree and get it from the hash map. 首先我们检查a ,看看已经存在一棵树并从哈希映射中获取它。 Next, we notice that a has a child named b , so we descend to b . 接下来,我们注意到a有一个名为b的子节点,所以我们下降到b Now, for the last entry, we see that it is already there too, telling us that we encountered a duplicate which we can drop. 现在,对于最后一个条目,我们看到它已经存在,告诉我们我们遇到了一个我们可以删除的副本。

Sorry for the improvised drawing, I hope I can get the idea across. 对于即兴绘画,我很抱歉,我希望我可以了解这个想法。 It is just about going through each array only once, storing it in a non-redundant way. 它只是通过每个数组一次,以非冗余的方式存储它。 So the time complexity would be O(n*k) . 所以时间复杂度为O(n*k) The used space increases but is bounded by O(n*k) since the worst case is no array shared any prefix, which results in the same space complexity. 使用的空间增加但受O(n*k)因为最坏的情况是没有数组共享任何前缀,这导致相同的空间复杂度。

Hope I didn't overlook something. 希望我没有忽视一些事情。

ONELINER ONELINER

A.filter((r={},a=>!(r[a]=++r[a]|0)))

I assume that your strings not contains , character. 我假设你的字符串不包含,字符。 If contains then change twice r[a] to r[a.join('|')] (where | is arbitrary separator) or use r[a.map(x=>x.length+','+x)] to allow all characters in your strings. 如果包含然后更改两次r[a]r[a.join('|')] (其中|是任意分隔符)或使用r[a.map(x=>x.length+','+x)]来允许字符串中的所有字符。 Here is working example . 这是一个有效的例子

Explanation 说明

In r={} we set once temporary object. r={}我们设置一次临时对象。 In filter function a=>... and is only for declare once empty temporary object in argument r={} . 在过滤函数a=>... ,仅用于在参数r={}声明一次空临时对象。 In function a=>... in a we have current A element . 在函数a=>...a我们有一个当前的A元素。 The JS make implicit cast a to string in r[a] . 该JS做隐式转换a以字符串r[a] Then in !(r[a]=++r[a]|0) we increase counter of occurrence element a and return true (as filter function value) if element a appear first time. 然后在!(r[a]=++r[a]|0) ,如果元素a第一次出现,我们增加出现元素a计数器并返回true(作为过滤函数值)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM