How to optimize this algorithm

Question

I need help with making this bit of code faster:

UnitBase* Formation::operator[](ushort offset)
{
 UnitBase* unit = 0;
 if (offset < itsNumFightingUnits)
 {
  ushort j = 0;
  for (ushort i = 0; i < itsNumUnits; ++i)
  {
   if (unitSetup[i] == UNIT_ON_FRONT)
   {
    if (j == offset)
     unit = unitFormation[i];
    ++j;
   }
  }
 }
 else
  throw NotFound();
 return unit;
}

So, to give some background, I have this class Formation which contains an array of pointers to UnitBase objects, called UnitFormation . The UnitBase* array has an equally sized array of numbers that indicate the status of each corresponding UnitBase object, called UnitSetup .

I have overloaded the [] operator so as to return only pointers to those UnitBase objects that have a certain status, so if I ask for itsFormation[5] , the function does not necessarily return UnitFormation[5] , but the 5th element of UnitFormation that has the status UNIT_ON_FRONT .

I have tried using the code above, but according to my profiler, it is taking way too much time. Which makes sense, since the algorithm has to count all the elements before returning the requested pointer.

Do I need to rethink the whole problem completely, or can this be made somehow faster?

Thanks in advance.

Answer 1

One quick optimization would be to return the unit as soon as you find it, rather than continuing to iterate over all of the rest of the units, eg

if (j == offset)
 unit = unitFormation[i];

becomes

if (j == offset)
 return unitFormation[i];

Of course, this only helps in the case that the unit you're looking for is towards the front of the unitFormation sequence, but it's trivial to do and does help sometimes.

A more involved, but more effective way to make this faster would be, for each status, to build and maintain a linked list of units that have that status. You would do this in parallel to the main array of units, and the contents of the linked lists would be pointers into the main units array, so you are not duplicating the unit data. Then, to find a given offset within a status, you could just traverse to the offset th node of the linked list, rather than iterating over each unit.

Making it a doubly-linked list and keeping a tail pointer would allow you to find elements with high offsets just as quickly as low offsets (by starting from the end and going backwards).

However, this would still be slow if there are a lot of units with the same status and you are looking for one whose offset is near the middle.

Answer 2

What about redesigning your code to maintain a table of "units on front" whatever that means, sounds interesting :-). If that part is really queried a lot and not modified often, then you'll save some time. Instead of inspecting the whole or parts of the complete list of units, you'll get the result instantaneously.

PS: int shall use the most natural type for your CPU, so using ushorts doesn't make necessarily your program faster .

Answer 3

In addition to the other suggestions some have made, you may want to look to see if any of these calls to this function are done needlessly, and eliminate those call points. For instance, if you see that you are calling this repeatedly when there is no chance the result changed. The fastest code is that which never runs.

Answer 4

Would it be possible to sort (or insert sorted) your data by status UNIT_ON_FRONT? That would make the function trivial.

Answer 5

How often will the status of a unit change? Perhaps you should keep a list of units which have the proper status, and only update that list when the status changes.

If necessary to minimize the cost of status changes, you could keep an array which says how many of the first 256 units have a particular status, how many of the next 256 units, etc. One could scan through the array 256 times as fast as one could scan through units until one was within 256 slots of the Nth "good" unit. Changing a unit's status would only require incrementing or decrementing one array slot.

Other approaches could be used to balance the cost of changing unit status with the cost of finding units, given various usage patterns.

Answer 6

One of the problems may be that this function may be called too often. Assuming the proportion of UNIT_ON_FRONT is constant, the complexity is linear. However, if you are calling the operator from a loop, that complexity is going rise to O(N^2).

If instead, you returned something like a boost::filter_iterator , you could improve the efficiency of those algorithms that need to iterate over UNIT_ON_FRONT.

Answer 7

I have redesigned the solution completely, using two vectors, one for units on the front, and one for other units, and changed all algorithms such that a unit with a changed status is immediately moved from one vector to another. Thus I eliminated the counting in the [] operator which was the main bottleneck.

Before using the profiler I was getting computation times of around 5500 to 7000 ms. After looking at the answers here, 1) I changed the loop variables from ushort to int or uint, which reduced duration by ~10%, 2) I did another modification in a secondary algorithm to reduce the duration by a further 30% or so, 3) I implemented the two vectors as explained above. This helped reduce the computation time from ~3300 ms to ~700 ms, another 40%!

In all that's a reduction of 85 - 90%! Thanks to SO and the profiler.

Next I'm going to implement a mediator pattern and only call the updating function when required, perhaps oozing out a few more ms. :)

New code that corresponds to the old snippet (the functionality is completely different now):

UnitBase* Formation::operator[](ushort offset)
{
    if (offset < numFightingUnits)
        return unitFormation[offset]->getUnit();
    else
        return NULL;
}

Much shorter and more to the point. Of course, there were many other heavy modifications, most important being that unitFormation is now a std::vector<UnitFormationElement*> rather than simply a UnitBase** . The UnitFormationElement* contains the UnitBase* along with some other vital data that was hanging around in the Formation class before.

Answer 8

This shouldn't have a big impact, but you could check the assembly to see whether itsNumFightingUnits and itsNumUnits are loaded every loop iteration or if they are put into registers. If they are loaded every time, try adding temporaries at the beginning of the function.

Answer 9

For that last bit of juice, and if the exception is thrown regularly, it might be worth switching to returning an error code. It's uglier code but the lack of stack jumps can be a big help. It's common in game development to turn off exceptions and RTTI.

Answer 10

You're outsmarting yourself (which everyone does sometimes). You've made a simple problem O(N^2). Just think about what you've got to do before you go overloading operators.

Added in response to comment:

Try backing off to a simpler language, like C, or the C subset of C++. Forget about abstractions, collection classes, and all that hoo-haw. Look at what your program needs to do and think about your algorithm that way. Then, if you can simplify it by using container classes and overloading, without making it do any more work, then go for it. Most performance problems are caused by taking simple problems and making them complicated by trying to use all the fancy ideas.

For example, you're taking the [] operator, which is usually thought of as O(1), and making it O(N). Then I presume you use it in some O(N) loop, so you get O(N^2). What you really want to do is loop over the array elements that satisfy a certain condition. You could just do that. If there are very few of them, and you're doing this at really high frequency, you might want to build a separate list of them. But keep your data structure simple , simple , simple . It's better to "waste" cycles, and only optimize if you really have to.

How to optimize this algorithm

Question

10 answers

solution1
7 2010-07-02 22:13:37

solution2
4 2010-07-02 22:17:07

solution3
2 2010-07-02 22:22:02

solution4
1 2010-07-02 22:36:57

solution5
1 2010-07-02 22:59:59

solution6
1 2010-07-03 09:50:12

solution7
1 ACCPTED 2010-07-03 23:35:02

solution8
0 2010-07-03 01:06:05

solution9
0 2010-07-03 09:33:15

solution10
0 2010-07-03 18:51:16

How to optimize this algorithm

Question

10 answers

solution1 7 2010-07-02 22:13:37

solution2 4 2010-07-02 22:17:07

solution3 2 2010-07-02 22:22:02

solution4 1 2010-07-02 22:36:57

solution5 1 2010-07-02 22:59:59

solution6 1 2010-07-03 09:50:12

solution7 1 ACCPTED 2010-07-03 23:35:02

solution8 0 2010-07-03 01:06:05

solution9 0 2010-07-03 09:33:15

solution10 0 2010-07-03 18:51:16

solution1
7 2010-07-02 22:13:37

solution2
4 2010-07-02 22:17:07

solution3
2 2010-07-02 22:22:02

solution4
1 2010-07-02 22:36:57

solution5
1 2010-07-02 22:59:59

solution6
1 2010-07-03 09:50:12

solution7
1 ACCPTED 2010-07-03 23:35:02

solution8
0 2010-07-03 01:06:05

solution9
0 2010-07-03 09:33:15

solution10
0 2010-07-03 18:51:16