简体   繁体   English

JavaScript-比较大量对象的有效方法

[英]JavaScript - Efficient way to compare very large array of objects

I have 2 very large sets of data that, due to the limitations of my environment, I need to compare in the client side. 由于环境的限制,我有2个非常大的数据集,我需要在客户端进行比较。

The size of the corresponding Array of objects are over 450k each, I have been testing different ways to compare them (For loops, .find, .indexOf, .reduce, $.grep) and all of them are running very slow (Around 700 calculations per minute). 对应的对象数组的大小每个都超过450k,我一直在测试以不同方式比较它们(对于循环,.find,.indexOf,.reduce,$。grep),并且它们运行都很慢(大约700个)每分钟计算)。

The check consists to find out if each of the objects in one of the array is already included in the other one such as: 检查包括找出数组之一中的每个对象是否已经包含在另一个对象中,例如:

var Arr1 = [{ID:2, Name: Bar}, {ID:1, Name: Foo}]
var Arr2 = [{ID:2, Name: Fu}, {ID:2, Name: Bar}] 

If any of the objects in Arr2 is included in the first one by any property, in this case (Arr2[1].Name == Arr1[0].Name)? 如果Arr2中的任何对象由任何属性包括在第一个对象中,那么在这种情况下(Arr2 [1] .Name == Arr1 [0] .Name)? would return true 会返回true

And in that case I would push it to a new Array of objects we can name Found: Found.push(Arr1[0]) 在那种情况下,我会将其推送到我们可以命名为Found的新对象数组: Found.push(Arr1[0])

I of course need to perform this check for all the 400k+ objects in my array so it gets pretty slow. 我当然需要对我数组中的所有400k +对象执行此检查,因此它变得非常慢。

I know there are several "buts" in my request, such as available RAM and Processor speed but assuming the perfect environment, what would be the fastest way? 我知道我的请求中有几个“附加条件”,例如可用的RAM和处理器速度,但假设环境完美,最快的方法是什么?

I think the most important thing is making sure your complexity doesn't go to O(n * m) ( n being the length of Arr1, and m being the length of Arr2). 我认为最重要的是确保您的复杂度不会达到O(n * m)n是Arr1的长度,而m是Arr2的长度)。

Looping over the second array and using indexOf or find on the first one, will give you the worst case of m * n operations (if none of the items in Arr2 appear in Arr1). 循环遍历第二个数组并在第一个数组上使用indexOffind ,将使您遇到m * n操作的最坏情况(如果Arr2中的所有项目均未出现在Arr1中)。

Therefore, you should create an index of Arr2 first, to ensure your lookups when going over Arr1 are inexpensive. 因此,您应该首先创建一个Arr2索引,以确保遍历Arr1时的查找便宜。

The hard part is determining how to index your array to support fast access. 困难的部分是确定如何索引数组以支持快速访问。 One way is to create a hash function: 一种方法是创建hash函数:

 // Include the properties that determine equality in this hash function const hash = ({ Name, Results }) => `${Name}|${Results}`; console.log( hash({ Name: "john.doe", Results: "Check", Timestamp: "-", Period: "Q2" }) ); 

Using this method, you can create an index of { string: Object } by going over all items in Arr2 once . 使用此方法,可以通过一次 Arr2所有项目来创建{ string: Object }的索引。

 const hash = ({ Name, Results }) => `${Name}|${Results}`; const arr2 = [ { Name: "john", Results: "Check", Timestamp: "-", Period: "Q2" }, { Name: "jane", Results: "Check", Timestamp: "-", Period: "Q2" }, { Name: "aisha", Results: "Check", Timestamp: "-", Period: "Q2" } ]; console.log( Object.fromEntries(arr2.map(x => [hash(x), x])) ); 

Note: depending on the javascript engine, it might be better to rewrite this using a for or while loop. 注意:根据JavaScript引擎的不同,最好使用forwhile循环来重写它。 Creating the entry-array first will also consume some memory. 首先创建条目数组也会消耗一些内存。 Here, I'm just trying to explain the general approach. 在这里,我只是在解释一般方法。


Using this index, finding a match to an element of Arr2 will be (almost?) of constant time complexity. 使用该索引,找到与Arr2元素匹配的内容(几乎是?)具有恒定的时间复杂度。

 const hash = ({ Name, Results }) => `${Name}|${Results}`; const arr2 = [ { Name: "john", Results: "Check", Timestamp: "-", Period: "Q2" }, { Name: "jane", Results: "Check", Timestamp: "-", Period: "Q2" }, { Name: "aisha", Results: "Check", Timestamp: "-", Period: "Q2" } ]; const arr1 = [ { Name: "john", Results: "Check", Timestamp: "-", Period: "Q2" }, { Name: "jane", Results: "Check", Timestamp: "-", Period: "Q2" }, { Name: "aisha", Results: "Check", Timestamp: "-", Period: "Q2" }, { Name: "robert", Results: "Check", Timestamp: "-", Period: "Q2" }, { Name: "ellen", Results: "Check", Timestamp: "-", Period: "Q2" }, { Name: "tin", Results: "Check", Timestamp: "-", Period: "Q2" } ]; const index = Object.fromEntries(arr2.map(x => [hash(x), x])); const results = arr1.filter(p => index.hasOwnProperty(hash(p))); console.log(`In both arrays: ${results.map(p => p.Name).join(", ")}`); 

I'm no computer science graduate, but I think this will bring you close to O(n + m) complexity, which should be doable for 2 x 450k items? 我不是计算机科学专业的毕业生,但是我认为这将使您接近O(n + m)复杂度,这对于2 x 450k项应该可行吗?


PS If Object.fromEntries , map and filter slow things down, you can rewrite to: PS如果Object.fromEntriesmapfilter慢下来的东西,您可以重写为:

 const arr2 = [ { Name: "john", Results: "Check", Timestamp: "-", Period: "Q2" }, { Name: "jane", Results: "Check", Timestamp: "-", Period: "Q2" }, { Name: "aisha", Results: "Check", Timestamp: "-", Period: "Q2" } ]; const arr1 = [ { Name: "john", Results: "Check", Timestamp: "-", Period: "Q2" }, { Name: "jane", Results: "Check", Timestamp: "-", Period: "Q2" }, { Name: "aisha", Results: "Check", Timestamp: "-", Period: "Q2" }, { Name: "robert", Results: "Check", Timestamp: "-", Period: "Q2" }, { Name: "ellen", Results: "Check", Timestamp: "-", Period: "Q2" }, { Name: "tin", Results: "Check", Timestamp: "-", Period: "Q2" } ]; const index = {}; for (let i = 0; i < arr2.length; i += 1) { const item = arr2[i]; index[`${item.Name}|${item.Results}`] = item; } const results = []; for (let i = 0; i < arr1.length; i += 1) { const item = arr1[i]; const match = index[`${item.Name}|${item.Results}`]; if (match) { results.push(match); } } console.log(`In both arrays: ${results.map(p => p.Name).join(", ")}`); 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM