Why does my Python script run slower than it should on my HeapSort implementation?

Question

I've got as assignment to implement the heap sort algorithm into either Python or Java (or any other languages). Since I'm not that really "fluent" in Python or Java I decided to do both.

But here I ran into a problem, the running time of the program is way too much hight than it "should" be. I mean by that, that the heap sort is supposed to run into a O(n * log n) and for current processor running on a clock rate of several GHz I didn't expect for that algorithm to run into over 2000secs for an array of size 320k

So for what I've done, I implemented the algorithm from the pseudo code of this sort in Python and in Java (I also tried the code in Julia from Rosetta Code to see if the running time was similar, why Julia ? Random pick)

So I checked the output for small input size problem, such as an array of size 10, 20 and 30. It appears that the array it correctly sorted in both languages/implementations.

Then I used the heapq library that implement this same algorithm to check once again if the running time was similar. It surprised me when it was actually the case... But after few tries I tried one last thing which is updating Python and then, the program using heapq ran much faster than the previous ones. Actually it was around 2k sec for the 320k array and now it around 1.5 sec or so.

I retried my algorithm and the problem was still there.

So here are the Heapsort class that I implemented:

class MaxHeap:
    heap = []

    def __init__(self, data=None):
        if data is not None:
            self.buildMaxHeap(data)

    @classmethod
    def toString(cls):
        return str(cls.heap)

    @classmethod
    def add(cls, elem):
        cls.heap.insert(len(cls.heap), elem)
        cls.buildMaxHeap(cls.heap)

    @classmethod
    def remove(cls, elem):
        try:
            cls.heap.pop(cls.heap.index(elem))
        except ValueError:
            print("The value you tried to remove is not in the heap")

    @classmethod
    def maxHeapify(cls, heap, i):
        left = 2 * i + 1
        right = 2 * i + 2
        largest = i
        n = len(heap)

        if left < n and heap[left] > heap[largest]:
            largest = left
        if right < n and heap[right] > heap[largest]:
            largest = right
        if largest != i:
            heap[i], heap[largest] = heap[largest], heap[i]
            cls.maxHeapify(heap, largest)

    @classmethod
    def buildMaxHeap(cls, heap):
        for i in range(len(heap) // 2, -1, -1):
            cls.maxHeapify(heap, i)
        cls.heap = heap

    @staticmethod
    def heapSort(table):
        heap = MaxHeap(table)

        output = []

        i = len(heap.heap) - 1
        while i >= 0:
            heap.heap[0], heap.heap[i] = heap.heap[i], heap.heap[0]
            output = [heap.heap[i]] + output
            heap.remove(heap.heap[i])
            heap.maxHeapify(heap.heap, 0)
            i -= 1
        return output

To log the runtime for each array size (10000 - 320000) I use this loop in the main function :

     i = 10000
     while i <= 320000:
         tab = [0] * i
         j = 0
         while j < i:
             tab[j] = randint(0, i)
             j += 1
         start = time()
         MaxHeap.heapSort(tab)
         end = time()
         pprint.pprint("Size of the array " + str(i))
         pprint.pprint("Total execution time: " + str(end - start) + "s")
         i *= 2

If you need the rest of the code to see where the error could be, don't hesitate I'll provide it. Just didn't want to share the whole file for no reasons.

As said earlier the running time I expected is from the worst case running time : O(n * log n) with modern architecture and a processor of 2.6GHz I would expect something around 1sec or even less (since the running time is asked in nanosecond I suppose that even 1 sec is still too long)

Here are the results :

Python (own) :                 Java (Own)

  Time        Size               Time       Size 
 593ms.       10k               243ms.      10k
 2344ms.      20k               600ms.      20k
 9558ms.      40k               1647ms.     40k
 38999ms.     80k               6666ms.     80k
 233811ms.    160k              62789ms.    160k
 1724926ms.   320k              473177ms.   320k

Python (heapq)                 Julia (Rosetta Code)
  Time        Size               Time        Size
 6ms.         10k               21ms.        10k
 14ms.        20k               21ms.        20k
 15ms.        40k               23ms.        40k
 34ms.        80k               28ms.        80k
 79ms.        160k              39ms.        160k
 168ms.       320k              60ms.        320k


And according to the formula the O(n * log n) give me :
40000       10k
86021       20k
184082      40k
392247      80k
832659      160k
1761648     320k

I think that these result could be used to determine how much time it should take depending on the machine (theoretically)

As you can see the high running time result comes from my algorithm, but I can't tell where in the code and that's why I'm asking here for help. (Runs slow both in Java and Python) (Didn't try to use heap sort in java lib is there is one to see the difference with my implementation, my bad)

Thanks a lot.

Edit : I forgot to add that I run this program on a MacBook Pro (last version MacOS, i7 2,6GHz. In case the problem could also comes from anything else than the code.

Edit 2 : Here are the modifications I did on the algorithm, following the answer I received. The program run approximately 200 times faster than previously, and so now it run in barely 2sec for the array of size 320k

class MaxHeap:

    def __init__(self, data=None):
        self.heap = []
        self.size = 0

        if data is not None:
            self.size = len(data)
            self.buildMaxHeap(data)

    def toString(self):
        return str(self.heap)

    def add(self, elem):
        self.heap.insert(self.size, elem)
        self.size += 1
        self.buildMaxHeap(self.heap)

    def remove(self, elem):
        try:
            self.heap.pop(self.heap.index(elem))
        except ValueError:
            print("The value you tried to remove is not in the heap")

    def maxHeapify(self, heap, i):
        left = 2 * i + 1
        right = 2 * i + 2
        largest = i

        if left < self.size and heap[left] > heap[largest]:
            largest = left
        if right < self.size and heap[right] > heap[largest]:
            largest = right
        if largest != i:
            heap[i], heap[largest] = heap[largest], heap[i]
            self.maxHeapify(heap, largest)

    def buildMaxHeap(self, heap):
        for i in range(self.size // 2, -1, -1):
            self.maxHeapify(heap, i)
        self.heap = heap

    @staticmethod
    def heapSort(table):
        heap = MaxHeap(table)

        i = len(heap.heap) - 1
        while i >= 0:
            heap.heap[0], heap.heap[i] = heap.heap[i], heap.heap[0]
            heap.size -= 1
            heap.maxHeapify(heap.heap, 0)
            i -= 1
        return heap.heap

And it runs using the same main as given before

Answer 1

Its interesting that you posted the clock speed of your computer- you COULD calculate the actual number of steps your algorithm requires... but you would need to know an awful lot about the implementation. For example, in python every time an object is created or goes out of scope, the interpreter updates counters on the underlying object, and frees the memory if those ref counts reach 0. Instead, you should look at the relative speed.

The third party examples you posted show the speed as less then doubling when the input array length doubles. That doesn't seem right, does it? Turns out that for those examples the initial work of building the array probably dominates the time spent sorting the array!

In your code, there is already one comment that calls out what I was going to say...

heap.remove(heap.heap[i]) This operation will go through your list (starting at index 0) looking for a value that matches, and then deletes it. This is already bad (if it works as intended, you are doing 320k comparisons on that line if your code worked as you expected!). But it gets worse- deleting an object from an array is not an in-place modification- every object after the deleted object has to be moved forward in the list. Finally, there is no guarantee that you are actually removing the last object there... duplicate values could exist!

Here is a useful website that lists the complexity of various operations in python - https://wiki.python.org/moin/TimeComplexity . In order to implement an algorithm as efficiently as possible, you need as many of your data structure operations to be O(1) as possible. Here is an example... here is some original code, presumably with heap.heap being a list...

        output = [heap.heap[i]] + output
        heap.remove(heap.heap[i])

doing

        output.append(heap.heap.pop())

Would avoid allocating a new list AND use a constant time operation to mutate the old one. (much better to just use the output backwards than use the O(n) time insert(0) method! you could use a dequeue object for output to get appendleft method if you really need the order)

If you posted your whole code there are probably lots of other little things we could help with. Hopefully this helped!

Why does my Python script run slower than it should on my HeapSort implementation?

Question

1 answers

solution1
2 ACCPTED 2019-03-27 05:10:27

Why does my Python script run slower than it should on my HeapSort implementation?

Question

1 answers

solution1 2 ACCPTED 2019-03-27 05:10:27

solution1
2 ACCPTED 2019-03-27 05:10:27