Bottom up heap analysis

Question

I'm trying to do time complexity analysis on the bottom up heap analysis and I'm stuck. I've done the mathematical evaluation that shows it is O(n) and i completely understand why. The part I'm stuck understanding is how in the "code" it achieves this. I know the outer for executes floor(n/2) times, and I believe the while executes log times, but I don't know how to get from floor(n/2)log to O(n).

Pseudo code: Time analysis:

for i = n/2-1; i <=0; i--         n/2+1
  k=i                             n/2
  while(2*k-1 <= n)               n/2(????)+1  <-- this is where I'm stuck. Should run log n times?
    j = k*2-1                     ...
    if(j<n && H[j] < H[j+1])      ...
      j++                         ...
    if(H[k] < h[j])               ...
      break                       ...
    swap(H[k],H[j])               ...
    k=j                           ...

So I can see that the while probably runs log n times, but I can't see how to get from there (n/2)log n to O(n). I'm only looking for worst case since I know best case is n/2 + 1 since it breaks when the subtree is a heap. Any help or direction to reading material is welcome.

Answer 1

The best advice I have to offer about working out the big-O cost of different loops is this one:

"When in doubt, work inside out!"

In other words, rather than starting with the outermost loop and working inward, start with the innermost loop and work outward.

In this case, we have this code:

for i = n/2-1; i >= 0; i--
  k=i                           
  while (2*k-1 <= n)
    j = k*2-1
    if(j<n && H[j] < H[j+1])
      j++
    if(H[k] < h[j])
      break
    swap(H[k],H[j])
    k=j

Since we're working inside out, let's focus first on this loop: Let's start by analyzing the innermost loop:

  while (2*k-1 <= n)
    j = k*2-1
    if(j<n && H[j] < H[j+1])
      j++
    if(H[k] < h[j])
      break
    swap(H[k],H[j])
    k=j

I'm going to assume this is a worst-case analysis and that we never trigger the inner break statement. In that case, this means that the loop progresses by having k move to either 2k - 1 or 2k after each step of the loop. This means that k is roughly doubling with each iteration of the loop. The loop ends when k exceeds n , so the number of iterations of the loop is equal to the number of times we have to double k before k exceeds n . That works out to O(log(n / k)) total loop iterations. Note that this isn't a constant; as k gets smaller, we end up doing more and more work per iteration.

We can replace the inner loop with the simpler "do O(log(n / k)) work" to get this:

for i = n/2-1; i >= 0; i--
  k=i                           
  do O(log (n / k)) work;

And, since k = i , we can rewrite this as

for i = n/2-1; i >= 0; i--
  do O(log (n / i)) work;

Now, how much total work is being done here? Adding up the work done per iteration across all iterations, we get that the work done is

log (n / (n/2)) + log (n / (n/2 - 1)) + log (n / (n/2 - 2)) +... + log(n / 2) + log(n / 1).

Now, "all" we have to do is simplify this sum. :-)

Using properties of logarithms, we can rewrite this as

(log n - log (n/2)) + (log n - log(n/2 - 1)) + (log n - log(n/2 - 2)) +... + (log n - log 1)

= (log n + log n +... + log n) - (log(n/2) + (log(n/2 - 1) +... + log 1)

= (n/2)(log n) - log((n/2)(n/2 - 1)(n/2 - 2)... 1)

= (n/2)(log n) - log((n/2)!)

Now, we can use Stirling's approximation to rewrite

log((n/2)!) = (n/2)log(n/2) - n log e + O(log n)

And, therefore, to get this:

(n/2)(log n) - log((n/2)!)

= (n/2)(log n) - (n/2)log(n/2) + n log e - O(log n)

= (n/2)(log (2n / 2)) - (n/2) log (n/2) + O(n)

= (n/2)(log 2 + log(n/2)) - (n/2) log (n/2) + O(n)

= (n/2)(1 + log(n/2)) - (n/2) log (n/2) + O(n)

= n/2 + O(n)

= O(n) .

So this whole sum works out to O(n) .

As you can see, this is a decidedly nontrivial big-O to calculate, Indeed, it's a lot trickier than just counting up the work done per iteration and multiplying by the number of iterations. because the way in which the work per iteration changes across iterations makes that a lot harder to do, Rather, instead we have to do a more nuanced analysis of how much work is done by each loop. then convert things into a summation and pull out some nontrivial (though not completely unexpected) tricks (Stirling's approximation and properties of logarithms) to get everything to work out as expected.

I would categorize this particular set of loops as a fairly tricky one to work through and not particularly representative of what you'd "normally" see when doing a loop analysis. But hopefully the techniques here give you a sense of how to work through trickier loop analyses and a glimpse of some of the beautiful math that goes into them.

Hope this helps!

Bottom up heap analysis

Question

1 answers

solution1
1 ACCPTED 2020-06-21 23:40:07

Bottom up heap analysis

Question

1 answers

solution1 1 ACCPTED 2020-06-21 23:40:07

solution1
1 ACCPTED 2020-06-21 23:40:07