简体   繁体   English

改进数组的步进两次(同一数组上的嵌套循环)

[英]Improving stepping through an array twice (Nested loop on same array)

I have a large set of data that I want to cycle through in order to determine various statistics on the data set from a point in time 'D1' to a point in time in the future 'D2'. 为了确定从时间点'D1'到未来'D2'中的某个时间点的数据集的各种统计数据,我想要循环使用大量数据。 Basically, I want to add to a database each time the difference between the values is larger than 10. For example: 基本上,我想在每次值之间的差异大于10时添加到数据库。例如:

Datum[] data = x;
for( Datum d1 : data ){
    Datum[] tail = y; //From d1 up to 10 elements ahead
    for( Datum d2 : tail ){
        //Calculate difference
        if( (d2.val - d1.val) > 10 ){
            //Insert into database
        }
    }
}

My question is, is there a better algorithm/method to doing this? 我的问题是,有没有更好的算法/方法来做到这一点? Since 9 elements from tail are reused in the next iteration of the outer loop, can I benefit somehow from that? 由于尾部的9个元素在外循环的下一次迭代中被重用,我可以从中获益吗? My goal was to get this down to much less than (Big-O Notation) O(n 2 ), but I can't wrap my head around it. 我的目标是将其降低到远远小于(Big-O Notation)O(n 2 ),但我无法绕过它。 Typically finding a D1, D2 pair that satisfies the criteria means that the next D1 element will have a greater chance of matching as well. 通常,找到满足标准的D1,D2对意味着下一个D1元素也将具有更大的匹配机会。 Can I use that to my advantage? 我可以利用这个优势吗?

I'm trying to get this to be as efficient as possible because the data set is just so large. 我试图让它尽可能高效,因为数据集非常大。

An index-based for loop might perform much better than an iterator, since you can index the original array directly and avoid copying to a new array. 基于索引的for循环可能比迭代器执行得更好,因为您可以直接索引原始数组并避免复制到新数组。 You'd have much better memory locality, less chance of false sharing, etc. 你有更好的记忆位置,错误分享的机会等等。

what you have is a classic sweepline algorithm which are O(k*n) with k the "overlap" or the portion that the inner loop runs over. 你所拥有的是经典的扫描线算法,它是O(k * n),其中k是“重叠”或内部循环的部分。 in your case it's maximum 10 no matter what n is 在你的情况下,无论n是多少都是10

Datum[] data = x;
for(int i=0;i<data.length;i++ ){
    Datum d1=data[i];
    Datum[] tail = y; //From d1 up to 10 elements ahead
    for(int j=i+1;j<Math.min(i+10,data.length);i++){
        d2 = data[j];
        //Calculate difference
        if( (d2.val - d1.val) > 10 ){
            //Insert into database

            break;//inner loop
        }
    }
}

In your shoes, the first thing I would do is profile a typical dataset and find out where the time is going. 在你的鞋子里,我要做的第一件事是描绘一个典型的数据集,并找出时间的去向。 This should give some hints as to where to focus your optimization efforts. 这应该提供一些关于优化工作重点的提示。

Assuming the calculation is as simple as the subtraction/comparison, and the arrays are quickly accessed, then your suggestion of looking to optimize saving to the database should be the next priority. 假设计算与减法/比较一样简单,并且可以快速访问数组,那么您希望优化保存到数据库的建议应该是下一个优先级。 For example, writing out a text file and using a bulk insert can give very fast performance compared to individual insert statements. 例如,与单个插入语句相比,写出文本文件和使用批量插入可以提供非常快的性能。 If you stick to using separate inserts, and are using JDBC, then batch updates will be a great help, since they avoid the latency in communicating with the database. 如果您坚持使用单独的插入,并且正在使用JDBC,那么批量更新将是一个很大的帮助,因为它们可以避免与数据库通信的延迟。

If that is still not fast enough, consider partitioning the array into N partitions, and have each partition processed by a separate thread. 如果仍然不够快,可以考虑将数组划分为N个分区,并让每个分区由单独的线程处理。 This will be particularly effective if processing is CPU bound. 如果处理受CPU限制,这将特别有效。

Finally, look for code-level optimizations, such as avoiding iterators by using an index. 最后,查找代码级优化,例如通过使用索引来避免迭代器。 If the number of items written to the database is small compared to the number of elements iterated, then the iterator creation may be a bottleneck. 如果写入数据库的项目数与迭代的元素数相比较小,则迭代器创建可能是瓶颈。

If the number of elements is larger than 10, and critically, more than can fit in the cpu cache, it will be more efficient to break up scanning into smaller blocks. 如果元素的数量大于10,并且严格地说,超过cpu缓存中的数量,则将扫描分解为更小的块将更有效。 For example, rather than scanning 1000 elements from data2, break it up into (say) 10 scans of 100, with each of the 10 scans using a different value of d1. 例如,不是从data2扫描1000个元素,而是将其分解为(例如)10次扫描100次,其中10次扫描中的每次扫描使用不同的d1值。 This is similar to how matrix multliplication is implemented in a block fashion and makes better use of the cpu caches. 这类似于以块方式实现矩阵多表示并更好地利用cpu高速缓存。

Although you are using two loops, which typically is a O(N^2) algorithm, the second loop has a fixed size - 10 elements, so this reduces to a simple constant factor - you are doing roughly a factor of 10 more work. 虽然你使用两个循环,通常是一个O(N ^ 2)算法,但第二个循环有一个固定的大小--10个元素,所以这减少到一个简单的常数因子 - 你做了大约10倍的工作。

There is an asymptotically faster way to solve this problem, but I have serious doubts as to whether it would run faster in practice because your window size (10) is so small. 有一种渐进式更快的方法可以解决这个问题,但我对于它是否会在实践中运行速度更加怀疑,因为你的窗口大小(10)太小了。 If you want to increase this size - which I'll call k - to be larger, then you might want to consider opting for an approach like the following. 如果你想增加这个大小 - 我称之为k - 更大,那么你可能会考虑选择类似下面的方法。

When you're using this algorithm, you need to maintain a window of the k elements that supports two operations: 当您使用此算法时,您需要维护一个支持两个操作的k个元素的窗口:

  1. Insert a new element, evicting the oldest. 插入一个新元素,驱逐最旧的元素。
  2. Returns all elements greater than some value. 返回大于某个值的所有元素。

One way to do this would be to store all of your elements in a data structure combining a balanced binary search tree and a queue. 一种方法是将所有元素存储在一个数据结构中,并结合平衡的二叉搜索树和队列。 The queue contains all k elements stored in the order in which they appear in the original sequence, and is used so that we can remember which element to evict when we need to add a new element. 队列包含按照它们在原始序列中出现的顺序存储的所有k个元素,并且被使用以便我们可以记住在需要添加新元素时要逐出哪个元素。 The balanced BST stores a copy of each of those elements stored in sorted order. 平衡的BST存储按排序顺序存储的每个元素的副本。 This means that you can implement the above operations like this: 这意味着您可以像这样实现上述操作:

  1. Insert a new element, evicting the oldest: Add the new element to the queue and BST. 插入一个新元素,逐出最旧元素:将新元素添加到队列和BST。 Then, dequeue from the queue to get the oldest element, then remove it from the BST. 然后,从队列中出队以获取最旧的元素,然后将其从BST中删除。 Runtime: O(log k), since the BST has k elements in it. 运行时:O(log k),因为BST中有k个元素。
  2. Return all elements greater than some value: Using the BST, find the smallest element at least as large as that value in O(log n) time. 返回大于某个值的所有元素:使用BST,在O(log n)时间内找到至少与该值一样大的最小元素。 Then, scan across the BST and list all elements at least as large as that element. 然后,扫描BST并列出至少与该元素一样大的所有元素。 This takes O(z) time, where z is the total number of matches found. 这需要O(z)时间,其中z是找到的匹配总数。

Collectively, if you have n elements and a total of z pairs that you need to insert into the database, this algorithm will take O(n log k + z) time. 总的来说,如果您需要插入数据库中的n个元素和总共z对,则此算法将花费O(n log k + z)时间。 To see this, note that we do a total of n copies of operation (1), which takes O(log k) time each. 要看到这一点,请注意我们总共执行n个操作副本(1),每个副本需要O(log k)时间。 We also do n copies of operation (2), which takes O(n log k) time to find successors and then O(z) total time across all iterations to list all the matching pairs. 我们还进行n次操作(2),其中O(n log k)时间用于查找后继,然后O(z)所有迭代的总时间列出所有匹配对。

The asymptotic runtime of this algorithm is good compared to the O(nk) algorithm you've originally posted. 与您最初发布的O(nk)算法相比,此算法的渐近运行时间很好。 Assuming that the number of matches isn't "really huge" (say, on the order of nk), this will be much faster as you increase n and k. 假设匹配的数量不是“非常大”(例如,按照nk的顺序),当你增加n和k时,这将会快得多。

If the values you're storing are integers in a small range (say, 0 - 10,000), you can speed this up even further by replacing the balanced BST with a data structure optimized for integers, like a van Emde Boas tree , which reduces this to O(n log log k + z). 如果您存储的值是小范围内的整数(例如,0 - 10,000),则可以通过将平衡BST替换为针对整数优化的数据结构(例如van Emde Boas树 )来进一步加快速度,这样可以减少这到O(n log log k + z)。 Again, this is only faster asymptotically , and if you are holding k constant at 10 this is almost certainly not worth it. 再次,这是唯一的渐近速度更快,如果你是在10举行恒定K这几乎可以肯定是不值得的。

Hope this helps! 希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM