简体   繁体   中英

Finding a sound timing test for sorting algorithms given a specific list

Is there an objective test to find the best sorting algorithm for specific types of lists? I've attempted such a test but am not sure of its soundness. The crux might be; can an objective test be devised to generalize optimal list types or, do these decisions require empirical evidence?

Problem

I'm trying to find the best sorting algorithm for lists of a specific type. They contain 2-202 items with unique integers. I'm trying to find the quickest way to sort millions of such lists.

This search began when I noticed the built in Tim Sort in C sorted(unsorted) for python is only marginally faster than my naive test simple_sort(unsorted_set, order) in Python. It was also interesting that quick_sort in Python was not consistently faster than simple_sort :

>>> def simple_sort(unsorted_set, order):
...     sorted_list = []
...     for i in order:
...         if i in unsorted_set:
...             sorted_list.append(i)
...     return sorted_list
>>> unsorted = [1, 5, 2, 9]
>>> order = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> unsorted_set = {item for item in unsorted}
>>> print simple_sort(unsorted_set, order)
[1, 2, 5, 9]

At some point my algorithm which requires sorting will be rewritten in C once I am familiar enough in C to do so.


  1. I've put the effort into writing the following test code with the assumption that simple_sort in C would out perform Tim Sort for my specific type of lists.
  2. I'm assuming sorted(unsorted) is a C implementation of Tim Sort.

Restults for sorting slices of a sorted list

  1. Counting Sort was the fastest.
  2. Of the fastest algorithms I tested in Python, Tim Sort in C was the slowest including my naive solution simple_sort.
  3. Interestingly, the Winning algorithms (below) cluster in 3 groups.
  4. This first test mistakenly sorted pre-sorted lists. I've added a test (below) for unsorted lists.
  5. The Excel file

Fastest algorithms for 2-202 sorted items 在此处输入图片说明 Slowest algorithms for 2-202 sorted items 在此处输入图片说明

Results for sorting slices of an unsorted list

  1. Tim Sort in C was fastest for my lists shorter than 203 items.
  2. For lists longer than ~475 items, simple_sort was faster.
  3. I added this section for unsorted lists because the in-place gnome_sort pre-sorted input lists for my first test.

Fastest algorithms for 2-202 unsorted items 在此处输入图片说明 Fastest algorithms for 2-1002 unsorted items 在此处输入图片说明 Slowest algorithms for 2-202 unsorted items 在此处输入图片说明 Slowest algorithms for 2-1002 unsorted items 在此处输入图片说明

Test Code

I've linked the full sorting test code here since the sorting algorithms would make an excessively long post.

# 'times' is how many repetitions each algorithm should run  
times = 100
# 'unique_items' is the number of non-redundant items in the unsorted list
unique_items = 1003
# 'order' is an ordered list
order = []
for number in range(unique_items):
    order.append(number)
# Generate the unsorted list
random_list = order[:]
random.shuffle(random_list)
# 'random_set' is used for simple_sort
random_simple = random_list[:]
random_set = {item for item in random_simple}

# A list of all sorted lists for each algorithm   
sorted_lists = [
    simple_sort(random_set, order),
    quick_sort(random_list[:]),
    merge_sort(random_list[:]),
    shell_sort(random_list[:]),
    bubble_sort(random_list[:]),
    heap_sort(random_list[:]),
    insertion_sort(random_list[:]),
    insertion_sort_bin(random_list[:]),
    circle_sort(random_list[:]),
    cocktail_sort(random_list[:]),
    counting_sort(random_list[:], 0, unique_items),
    cycle_sort(random_list[:]),
    gnome_sort(random_list[:]),
    pancake_sort(random_list[:]),
    patience_sort(random_list[:]),
    radix_sort(random_list[:], unique_items),
    selection_sort(random_list[:]),
    abstract_tree_sort(random_list[:], BinarySearchTree),
    sorted(random_list[:])
    ]

# A set of all sorted lists for each algorithm
sorted_set = {repr(item) for item in sorted_lists}
# If only one version of the sorted list exists, True is evaluated
print 'All algorithms sort identically', len(sorted_set) is 1

# Sort slices of an unsorted list and record the times in 'time_record'
time_record = defaultdict(list)
for length in range(2, unique_items, 10):
    unsorted = random_list[:length]
    # 'unsorted_set' is used for simple_sort
    simple_unsorted = unsorted[:]
    unsorted_set = {item for item in simple_unsorted}

    print '**********', length, '**********'    

    print 'simple'
    simple = timeit.timeit(lambda: simple_sort(unsorted_set, order), number=times)
    time_record['Simple Sort'].append(simple)    

    print 'quick'
    quick_unsorted = unsorted[:]
    quick = timeit.timeit(lambda: quick_sort(quick_unsorted), number=times)
    time_record['Quick Sort'].append(quick)

    print 'merge'
    merge_unsorted = unsorted[:]
    merged = timeit.timeit(lambda: merge_sort(merge_unsorted), number=times)
    time_record['Merge Sort'].append(merged)

    print 'shell'
    shell_unsorted = unsorted[:]
    shell = timeit.timeit(lambda: merge_sort(shell_unsorted), number=times)
    time_record['Shell Sort'].append(shell)

    print 'bubble'
    bubble_unsorted = unsorted[:]
    bubble = timeit.timeit(lambda: bubble_sort(bubble_unsorted), number=times)
    time_record['In Place Bubble Sort'].append(bubble)    

    print 'heap'
    heap_unsorted = unsorted[:]
    heap = timeit.timeit(lambda: heap_sort(heap_unsorted), number=times)
    time_record['In Place Heap Sort'].append(heap)

    print 'insertion'
    insertion_unsorted = unsorted[:]
    insertion = timeit.timeit(lambda: insertion_sort(insertion_unsorted), number=times)
    time_record['In Place Insertion Sort'].append(insertion)

    print 'insertion binary'
    insertion_bin_unsorted = unsorted[:]
    insertion_bin = timeit.timeit(lambda: insertion_sort_bin(insertion_bin_unsorted), number=times)
    time_record['In Place Insertion Sort Binary'].append(insertion_bin)

    print 'circle'
    circle_unsorted = unsorted[:]
    circle = timeit.timeit(lambda: circle_sort(circle_unsorted), number=times)
    time_record['In Place Circle Sort'].append(circle)

    print 'cocktail'
    cocktail_unsorted = unsorted[:]
    cocktail = timeit.timeit(lambda: cocktail_sort(cocktail_unsorted), number=times)   
    time_record['In Place Cocktail Sort'].append(cocktail)

    print 'counting'
    counting_unsorted = unsorted[:]
    counting = timeit.timeit(lambda: counting_sort(counting_unsorted, 0, length), number=times)
    time_record['Counting Sort'].append(counting)

    print 'cycle'
    cycle_unsorted = unsorted[:]
    cycle = timeit.timeit(lambda: cycle_sort(cycle_unsorted), number=times)
    time_record['In Place Cycle Sort'].append(cycle)

    print 'gnome'
    gnome_unsorted = unsorted[:]
    gnome = timeit.timeit(lambda: gnome_sort(gnome_unsorted), number=times)
    time_record['Gnome Sort'].append(gnome)

    print 'pancake'
    pancake_unsorted = unsorted[:]
    pancake = timeit.timeit(lambda: pancake_sort(pancake_unsorted), number=times)
    time_record['In Place Pancake Sort'].append(pancake)

    print 'patience'
    patience_unsorted = unsorted[:]
    patience = timeit.timeit(lambda: patience_sort(patience_unsorted), number=times)
    time_record['In Place Patience Sort'].append(patience)

    print 'radix'
    radix_unsorted = unsorted[:]
    radix = timeit.timeit(lambda: radix_sort(radix_unsorted, length), number=times)
    time_record['Radix Sort'].append(radix)

    print 'selection'
    selection_unsorted = unsorted[:]
    selection = timeit.timeit(lambda: selection_sort(selection_unsorted), number=times)
    time_record['Selection Sort'].append(selection)

    print 'tree'
    tree_unsorted = unsorted[:]
    tree_sorted = timeit.timeit(lambda: abstract_tree_sort(tree_unsorted, BinarySearchTree), number=times)
    time_record['Abstract Tree Sort'].append(tree_sorted)

    print 'tim in c'
    tim_unsorted = unsorted[:]
    tim = timeit.timeit(lambda: sorted(tim_unsorted), number=times)
    time_record['Tim in C'].append(tim)

The best sorting algorithm depends on various factors, including properties of your input (eg size of an element) and requirements to your result (eg stability). For a given input set, Bubblesort may be unusually fast in O(N) while Quicksort may be unusually slow in O(N x N) while Mergesort will always be in O(N x logN).

In the general case, sorting is in O(N x logN), ie there's no algorithm that can sort arbitrary sets faster than this. However, for certain input characteristics, there are sorting algorithms that are linear with respect to the size of the input set. Obviously, you can't get any faster than this.

If you don't know much about sorting, your best bet might be to simply compare some common sorting algorithms. Since your input consists of "unique integers", you don't have to care about whether your sorting algorithm is stable or not.

Try the following algorithms on actual data and choose the fastest:

  • Mergesort
  • Bubblesort
  • Quicksort
  • Radixsort

And if the overall number of possible inputs is "small", you may even be able to skip sorting and just pre-compute all possible results.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM