简体   繁体   English

如何优化此算法

[英]How to optimize this algorithm

I need help with making this bit of code faster: 我需要帮助才能使这段代码更快:

UnitBase* Formation::operator[](ushort offset)
{
 UnitBase* unit = 0;
 if (offset < itsNumFightingUnits)
 {
  ushort j = 0;
  for (ushort i = 0; i < itsNumUnits; ++i)
  {
   if (unitSetup[i] == UNIT_ON_FRONT)
   {
    if (j == offset)
     unit = unitFormation[i];
    ++j;
   }
  }
 }
 else
  throw NotFound();
 return unit;
}

So, to give some background, I have this class Formation which contains an array of pointers to UnitBase objects, called UnitFormation . 所以,为了给出一些背景知识,我有这个类Formation ,它包含一个指向UnitBase对象的指针数组,称为UnitFormation The UnitBase* array has an equally sized array of numbers that indicate the status of each corresponding UnitBase object, called UnitSetup . UnitBase*数组具有相同大小的数字数组,用于指示每个对应的UnitBase对象的状态,称为UnitSetup

I have overloaded the [] operator so as to return only pointers to those UnitBase objects that have a certain status, so if I ask for itsFormation[5] , the function does not necessarily return UnitFormation[5] , but the 5th element of UnitFormation that has the status UNIT_ON_FRONT . 我重载了[]运算符,只返回指向那些具有特定状态的UnitBase对象的指针,所以如果我要求它的itsFormation[5] ,该函数不一定返回UnitFormation[5] ,而是UnitFormation的第5个元素其状态为UNIT_ON_FRONT

I have tried using the code above, but according to my profiler, it is taking way too much time. 我尝试过使用上面的代码,但根据我的分析器,这花费了太多时间。 Which makes sense, since the algorithm has to count all the elements before returning the requested pointer. 这是有道理的,因为算法必须在返回请求的指针之前计算所有元素。

Do I need to rethink the whole problem completely, or can this be made somehow faster? 我是否需要完全重新思考整个问题,还是可以以某种方式更快地进行?

Thanks in advance. 提前致谢。

One quick optimization would be to return the unit as soon as you find it, rather than continuing to iterate over all of the rest of the units, eg 一个快速优化将是在您找到它时立即返回单元,而不是继续迭代所有其余单元,例如

if (j == offset)
 unit = unitFormation[i];

becomes

if (j == offset)
 return unitFormation[i];

Of course, this only helps in the case that the unit you're looking for is towards the front of the unitFormation sequence, but it's trivial to do and does help sometimes. 当然,这只有在您正在寻找的单元朝向unitFormation序列的前面时才会有所帮助,但这样做很简单,并且有时会有所帮助。

A more involved, but more effective way to make this faster would be, for each status, to build and maintain a linked list of units that have that status. 对于每种状态,更加有效但更有效的方法是使其更快,以构建和维护具有该状态的单元的链接列表。 You would do this in parallel to the main array of units, and the contents of the linked lists would be pointers into the main units array, so you are not duplicating the unit data. 您可以与主单元数组并行执行此操作,链接列表的内容将指向主单元数组,因此您不会复制单元数据。 Then, to find a given offset within a status, you could just traverse to the offset th node of the linked list, rather than iterating over each unit. 然后,要在状态中找到给定的偏移量,您可以只遍历链接列表的offset节点,而不是遍历每个单元。

Making it a doubly-linked list and keeping a tail pointer would allow you to find elements with high offsets just as quickly as low offsets (by starting from the end and going backwards). 使它成为一个双向链表并保持一个尾指针可以让你找到具有高偏移的元素,就像低偏移一样快(从结束开始然后向后)。

However, this would still be slow if there are a lot of units with the same status and you are looking for one whose offset is near the middle. 但是,如果有很多具有相同状态的单位并且您正在寻找偏移量接近中间的单位,那么这仍然会很慢。

What about redesigning your code to maintain a table of "units on front" whatever that means, sounds interesting :-). 那么重新设计你的代码以保持一个“前面的单位”表无论那意味着什么,听起来很有趣:-)。 If that part is really queried a lot and not modified often, then you'll save some time. 如果真的要查询那部分并且经常不进行修改,那么你将节省一些时间。 Instead of inspecting the whole or parts of the complete list of units, you'll get the result instantaneously. 您无需检查整个或部分完整的单元列表,而是立即获得结果。

PS: int shall use the most natural type for your CPU, so using ushorts doesn't make necessarily your program faster . PS: int应该为你的CPU使用最自然的类型,因此使用ushorts 并不一定能使你的程序更快

In addition to the other suggestions some have made, you may want to look to see if any of these calls to this function are done needlessly, and eliminate those call points. 除了一些人提出的其他建议之外,您可能希望查看是否对这个函数的任何调用都是不必要的,并消除了这些调用点。 For instance, if you see that you are calling this repeatedly when there is no chance the result changed. 例如,如果您发现在结果无法改变的情况下重复调用此选项。 The fastest code is that which never runs. 最快的代码是永不运行的代码。

Would it be possible to sort (or insert sorted) your data by status UNIT_ON_FRONT? 是否可以按状态UNIT_ON_FRONT对数据进行排序(或插入排序)? That would make the function trivial. 这将使功能变得微不足道。

How often will the status of a unit change? 一个单位的状态多久会改变一次? Perhaps you should keep a list of units which have the proper status, and only update that list when the status changes. 也许您应该保留一个具有正确状态的单元列表,并且只在状态更改时更新该列表。

If necessary to minimize the cost of status changes, you could keep an array which says how many of the first 256 units have a particular status, how many of the next 256 units, etc. One could scan through the array 256 times as fast as one could scan through units until one was within 256 slots of the Nth "good" unit. 如果需要最小化状态更改的成本,您可以保留一个数组,其中显示前256个单元中有多少具有特定状态,接下来256个单元中有多少等等。一个可以扫描数组的速度是256倍一个人可以扫描单位,直到一个在第N个“好”单位的256个插槽内。 Changing a unit's status would only require incrementing or decrementing one array slot. 更改单元的状态只需要递增或递减一个阵列槽。

Other approaches could be used to balance the cost of changing unit status with the cost of finding units, given various usage patterns. 在给定各种使用模式的情况下,可以使用其他方法来平衡改变单元状态的成本与查找单元的成本。

One of the problems may be that this function may be called too often. 其中一个问题可能是太频繁地调用此函数。 Assuming the proportion of UNIT_ON_FRONT is constant, the complexity is linear. 假设UNIT_ON_FRONT的比例是常数,则复杂度是线性的。 However, if you are calling the operator from a loop, that complexity is going rise to O(N^2). 但是,如果从循环中调用运算符,那么复杂性将上升到O(N ^ 2)。

If instead, you returned something like a boost::filter_iterator , you could improve the efficiency of those algorithms that need to iterate over UNIT_ON_FRONT. 相反,如果你返回类似boost::filter_iterator东西,你可以提高那些需要迭代UNIT_ON_FRONT的算法的效率。

I have redesigned the solution completely, using two vectors, one for units on the front, and one for other units, and changed all algorithms such that a unit with a changed status is immediately moved from one vector to another. 我已经完全重新设计了解决方案,使用了两个向量,一个用于前面的单元,一个用于其他单元,并且更改了所有算法,以便具有已更改状态的单元立即从一个向量移动到另一个向量。 Thus I eliminated the counting in the [] operator which was the main bottleneck. 因此我消除了[]运算符中的计数,这是主要的瓶颈。

Before using the profiler I was getting computation times of around 5500 to 7000 ms. 在使用分析器之前,我的计算时间大约为5500到7000毫秒。 After looking at the answers here, 1) I changed the loop variables from ushort to int or uint, which reduced duration by ~10%, 2) I did another modification in a secondary algorithm to reduce the duration by a further 30% or so, 3) I implemented the two vectors as explained above. 在看了这里的答案之后,1)我将循环变量从ushort更改为int或uint,这将持续时间缩短了~10%,2)我在辅助算法中进行了另一次修改,将持续时间再减少了30%左右,3)我实现了如上所述的两个向量。 This helped reduce the computation time from ~3300 ms to ~700 ms, another 40%! 这有助于将计算时间从~3300 ms减少到~700 ms,另外40%!

In all that's a reduction of 85 - 90%! 总之,这减少了85-90%! Thanks to SO and the profiler. 感谢SO和探查器。

Next I'm going to implement a mediator pattern and only call the updating function when required, perhaps oozing out a few more ms. 接下来,我将实现一个中介模式,并且只在需要时调用更新函数,可能会渗出几个ms。 :) :)

New code that corresponds to the old snippet (the functionality is completely different now): 与旧代码段对应的新代码(功能现在完全不同):

UnitBase* Formation::operator[](ushort offset)
{
    if (offset < numFightingUnits)
        return unitFormation[offset]->getUnit();
    else
        return NULL;
}

Much shorter and more to the point. 更短,更重要。 Of course, there were many other heavy modifications, most important being that unitFormation is now a std::vector<UnitFormationElement*> rather than simply a UnitBase** . 当然,还有许多其他重大修改,最重要的是unitFormation现在是std::vector<UnitFormationElement*>而不是简单的UnitBase** The UnitFormationElement* contains the UnitBase* along with some other vital data that was hanging around in the Formation class before. UnitFormationElement*包含UnitBase*以及之前在Formation类中UnitBase*一些其他重要数据。

This shouldn't have a big impact, but you could check the assembly to see whether itsNumFightingUnits and itsNumUnits are loaded every loop iteration or if they are put into registers. 这不应该产生很大的影响,但你可以检查程序集,看看是否每次循环迭代都加载了itsNumFightingUnitsitsNumUnits ,或者它们是否被放入寄存器。 If they are loaded every time, try adding temporaries at the beginning of the function. 如果每次都加载它们,请尝试在函数开头添加临时值。

For that last bit of juice, and if the exception is thrown regularly, it might be worth switching to returning an error code. 对于最后一点果汁,如果定期抛出异常,可能需要切换到返回错误代码。 It's uglier code but the lack of stack jumps can be a big help. 这是更丑陋的代码,但缺乏堆栈跳跃可能是一个很大的帮助。 It's common in game development to turn off exceptions and RTTI. 关闭异常和RTTI在游戏开发中很常见。

You're outsmarting yourself (which everyone does sometimes). 你是超越自己(每个人都有时会这样做)。 You've made a simple problem O(N^2). 你做了一个简单的问题O(N ^ 2)。 Just think about what you've got to do before you go overloading operators. 想想在你超载运营商之前你必须做些什么。

Added in response to comment: 添加以回应评论:

Try backing off to a simpler language, like C, or the C subset of C++. 尝试退回到更简单的语言,如C或C ++的C子集。 Forget about abstractions, collection classes, and all that hoo-haw. 忘记抽象,收集课程,以及所有那些hao-haw。 Look at what your program needs to do and think about your algorithm that way. 看看你的程序需要做什么,并以这种方式考虑你的算法。 Then, if you can simplify it by using container classes and overloading, without making it do any more work, then go for it. 然后,如果你可以通过使用容器类和重载来简化它,而不需要再做任何工作,那么就去做吧。 Most performance problems are caused by taking simple problems and making them complicated by trying to use all the fancy ideas. 大多数性能问题都是由于通过尝试使用所有奇特的想法来解决简单的问题并使它们变得复杂。

For example, you're taking the [] operator, which is usually thought of as O(1), and making it O(N). 例如,您正在使用[]运算符,通常将其视为O(1),并将其设为O(N)。 Then I presume you use it in some O(N) loop, so you get O(N^2). 然后我假设你在一些O(N)循环中使用它,所以得到O(N ^ 2)。 What you really want to do is loop over the array elements that satisfy a certain condition. 你真正想要做的是遍历满足特定条件的数组元素。 You could just do that. 你可以这样做。 If there are very few of them, and you're doing this at really high frequency, you might want to build a separate list of them. 如果它们非常少,而且您的频率非常高,那么您可能需要构建一个单独的列表。 But keep your data structure simple , simple , simple . 但要保持数据结构简单简单简单 It's better to "waste" cycles, and only optimize if you really have to. 最好“浪费”循环,只有在你真正需要时才进行优化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM