简体   繁体   English

在VBA中使用这两种循环方式的时间复杂度有什么区别?

[英]What is the difference between the time complexity of these two ways of using loops in VBA?

I got a theoretical question, will appreciate if you advise me here. 我有一个理论问题,如果你在这里建议我会很感激。

Say, we have these two pieces of code. 说,我们有这两段代码。 First one: 第一:

For Each cell In rng1
    collectionOfValues.Add (cell.Value)
Next

For Each cell In rng2
   collectionOfAddresses.Add (cell.Address)
Next

For i = 1 To collectionOfAddresses.Count
   Range(collectionOfAddresses.Item(i)) = collectionOfValues.Item(i)
Next i

Here we add addresses from one range to a certain collection, and values from another range to a second collection, and then fill cells on these addresses with the values. 在这里,我们将一个范围的地址添加到某个集合,将另一个范围的值添加到第二个集合,然后使用这些值填充这些地址上的单元格。

Here is the second code, which makes the same: 这是第二个代码,它是相同的:

For i = 1 To rng1.Rows.Count
  For j = 1 To rng1.Columns.Count
       rng2.Cells(i, j) = rng1.Cells(i, j)
  Next j
Next i

So, the question is - what is the time of execution in both cases? 所以,问题是 - 两种情况下执行的时间是什么时候? I mean, it's clear that the second case is O(n^2) (to make it easier we assume the range is square). 我的意思是,很明显第二种情况是O(n ^ 2)(为了使我们更容易假设范围是正方形)。

What about the first one? 第一个怎么样? Is For Each considered a nested loop? For Each被认为是嵌套循环吗?

And if so, does it mean that the time of the first code is O(n^2) + O(n^2) + O(n^2) = 3*O(n^2) which makes pretty the same as the second code time? 如果是这样,是否意味着第一个代码的时间是O(n ^ 2)+ O(n ^ 2)+ O(n ^ 2)= 3 * O(n ^ 2),这与第二个代码时间?

In general, do these two codes differ apart from the fact that the first one takes additional memory when creating collections? 一般来说,这两个代码是否与第一个代码在创建集合时需要额外内存的事实不同?

Thanks a lot in advance. 非常感谢提前。

Actually, your first example is O(n^4) ! 实际上,你的第一个例子是O(n ^ 4)

That might sound surprising, but this is because indexing into a VBA Collection has linear, not constant, complexity . 这可能听起来令人惊讶,但这是因为索引到VBA集合中具有线性而非常量的复杂性 The VBA Collection essentially has the performance characteristics of a list - to get element N by index takes a time proportional to N. To iterate the whole thing by index takes a time proportional to N^2. VBA Collection基本上具有列表的性能特征 - 通过索引获取元素N需要与N成比例的时间。 通过索引迭代整个事物需要与N ^ 2成比例的时间。 (I switched cases on you to distinguish N, the number of elements in the data structure, from your n, the number of cells on the side of a square block of cells. So here N = n^2.) (我切换你的情况,以区分N,数据结构中的元素数量,从你的n,一个方块单元格一侧的单元格数。所以这里N = n ^ 2。)

That is one reason why VBA has the For...Each notation for iterating Collections. 这就是为什么VBA具有For ...每个用于迭代集合的表示法的原因之一。 When you use For...Each, VBA uses an iterator behind the scenes so walking through the entire Collection is O(N) not O(N^2). 当你使用For ... Each时,VBA在幕后使用迭代器,因此遍历整个Collection是O(N)而不是O(N ^ 2)。

So, switching back to your n, your first two loops are using For...Each over a Range with n^2 cells, so they are each O(n^2). 因此,切换回你的n,你的前两个循环使用For ...每个在一个范围内有n ^ 2个单元格,所以它们都是O(n ^ 2)。 Your third loop is using For...Next over a Collection with n^2 elements, so that is O(n^4). 你的第三个循环是在具有n ^ 2个元素的Collection上使用For ... Next,因此它是O(n ^ 4)。

I actually don't know for sure about your last loop because I don't know exactly how the Cells property of a Range works - there could be some extra hidden complexity there. 我实际上不确定你的最后一个循环,因为我不确切知道Range的Cells属性是如何工作的 - 那里可能存在一些额外的隐藏复杂性。 But I think Cells will have the performance characteristics of an array, so O(1) for random access by index, and that would make the last loop O(n^2). 但我认为Cell将具有数组的性能特征,因此O(1)用于通过索引进行随机访问,这将使最后一个循环为O(n ^ 2)。

This is a good example of what Joel Spolsky called "Shlemiel the painter's algorithm": 这是Joel Spolsky所说的“Shlemiel the painter's algorithm”的一个很好的例子:

There must be a Shlemiel the Painter's Algorithm in there somewhere. 在那里必须有一个Shlemiel画家的算法。 Whenever something seems like it should have linear performance but it seems to have n-squared performance, look for hidden Shlemiels. 每当看起来它应该具有线性性能但它似乎具有n平方性能时,寻找隐藏的Shlemiels。 They are often hidden by your libraries. 它们经常被您的图书馆隐藏。

(See this article from way before stackoverflow was founded: http://www.joelonsoftware.com/articles/fog0000000319.html ) (在stackoverflow成立之前查看此文章: http//www.joelonsoftware.com/articles/fog0000000319.html

More about VBA performance can be found at Doug Jenkins's webstie: 有关VBA表现的更多信息可以在Doug Jenkins的网站上找到:

http://newtonexcelbach.wordpress.com/2010/03/07/the-speed-of-loops/ http://newtonexcelbach.wordpress.com/2010/03/07/the-speed-of-loops/

http://newtonexcelbach.wordpress.com/2010/01/15/good-practice-best-practice-or-just-practice/ http://newtonexcelbach.wordpress.com/2010/01/15/good-practice-best-practice-or-just-practice/

(I will also second what cyberkiwi said about not looping through Ranges just to copy cell contents if this were a "real" program and not just a learning excercise.) (我还要说明,如果这是一个“真正的”程序,而不仅仅是学习练习,那么cyberkiwi所说的不会通过Ranges来复制单元格内容。)

You are right that the first is 3 x O(n^2), but remember that O-notation does not care about constants, so in terms of complexity, it is still an O(n^2) algorithm . 你是对的,第一个是3 x O(n ^ 2),但请记住,O符号并不关心常数,因此就复杂性而言,它仍然是一个O(n^2) algorithm

The first one is not considered a nested loop, even if it is working on the same size as the loop in the second. 第一个不被认为是嵌套循环,即使它与第二个循环的工作大小相同。 It is just a straight iteration over an N-item range in Excel. 它只是Excel中N项目范围的直接迭代。 What makes it N^2 is the fact that you are defining N as the length of a side, ie number of rows/columns (being square). 是什么使它成为N ^ 2的事实是你将N定义为边的长度,即行/列的数量(正方形)。

Just an Excel VBA note, you shouldn't be looping through cells nor storing addresses anyway. 只是一个Excel VBA注释,你不应该循环遍历单元格,也不应该存储地址。 Neither of the approaches is optimal. 这两种方法都不是最佳的。 But I think they serve to illustrate your question to understand O-notation. 但我认为它们可以用来说明你的问题来理解O符号。

rng1.Copy
rng2.Cells(1).PasteSpecial xlValues
Application.CutCopyMode = False

Remember not to confuse the complexity of YOUR code with the complexity of background Excel functions. 切记不要将您的代码的复杂性与后台Excel函数的复杂性混淆。 Over all the amount of work done is N^2 in both cases. 在两种情况下,完成的所有工作量都是N ^ 2。 However, in your first example - YOUR code is actually only 3N (N for each of the three loops). 但是,在您的第一个示例中 - 您的代码实际上只有3N(三个循环中的每一个都为N)。 The fact that a single statement in Excel can fill in multiple values does not change the complexity of your written code. Excel中的单个语句可以填充多个值这一事实不会改变编写代码的复杂性。 A foreach loop is the same as a for loop - N complexity by itself. foreach循环与for循环相同 - N复杂度本身。 You only get N^2 when you nest loops. 嵌套循环时,只能获得N ^ 2。

To answer your question about which is better - generally it is preferable to use built in functions where you can. 要回答你关于哪个更好的问题 - 通常最好尽可能使用内置函数。 The assumption should be that internally Excel will run more efficiently than you could write yourself. 应该假设内部Excel的运行效率比您自己编写的效率高。 However (knowing MS) - make sure you always check that assumption if performance is a priority. 但是(知道MS) - 如果性能优先,请确保始终检查该假设。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM