Is there an objective test to find the best sorting algorithm for specific types of lists? I've attempted such a test but am not sure of its soundness. The crux might be; can an objective test be devised to generalize optimal list types or, do these decisions require empirical evidence?
I'm trying to find the best sorting algorithm for lists of a specific type. They contain 2-202 items with unique integers. I'm trying to find the quickest way to sort millions of such lists.
This search began when I noticed the built in Tim Sort in C sorted(unsorted)
for python is only marginally faster than my naive test simple_sort(unsorted_set, order)
in Python. It was also interesting that quick_sort
in Python was not consistently faster than simple_sort
:
>>> def simple_sort(unsorted_set, order):
... sorted_list = []
... for i in order:
... if i in unsorted_set:
... sorted_list.append(i)
... return sorted_list
>>> unsorted = [1, 5, 2, 9]
>>> order = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> unsorted_set = {item for item in unsorted}
>>> print simple_sort(unsorted_set, order)
[1, 2, 5, 9]
At some point my algorithm which requires sorting will be rewritten in C once I am familiar enough in C to do so.
simple_sort
in C would out perform Tim Sort for my specific type of lists.sorted(unsorted)
is a C implementation of Tim Sort.Fastest algorithms for 2-202 sorted items Slowest algorithms for 2-202 sorted items
simple_sort
was faster.gnome_sort
pre-sorted input lists for my first test. Fastest algorithms for 2-202 unsorted items Fastest algorithms for 2-1002 unsorted items
Slowest algorithms for 2-202 unsorted items
Slowest algorithms for 2-1002 unsorted items
I've linked the full sorting test code here since the sorting algorithms would make an excessively long post.
# 'times' is how many repetitions each algorithm should run
times = 100
# 'unique_items' is the number of non-redundant items in the unsorted list
unique_items = 1003
# 'order' is an ordered list
order = []
for number in range(unique_items):
order.append(number)
# Generate the unsorted list
random_list = order[:]
random.shuffle(random_list)
# 'random_set' is used for simple_sort
random_simple = random_list[:]
random_set = {item for item in random_simple}
# A list of all sorted lists for each algorithm
sorted_lists = [
simple_sort(random_set, order),
quick_sort(random_list[:]),
merge_sort(random_list[:]),
shell_sort(random_list[:]),
bubble_sort(random_list[:]),
heap_sort(random_list[:]),
insertion_sort(random_list[:]),
insertion_sort_bin(random_list[:]),
circle_sort(random_list[:]),
cocktail_sort(random_list[:]),
counting_sort(random_list[:], 0, unique_items),
cycle_sort(random_list[:]),
gnome_sort(random_list[:]),
pancake_sort(random_list[:]),
patience_sort(random_list[:]),
radix_sort(random_list[:], unique_items),
selection_sort(random_list[:]),
abstract_tree_sort(random_list[:], BinarySearchTree),
sorted(random_list[:])
]
# A set of all sorted lists for each algorithm
sorted_set = {repr(item) for item in sorted_lists}
# If only one version of the sorted list exists, True is evaluated
print 'All algorithms sort identically', len(sorted_set) is 1
# Sort slices of an unsorted list and record the times in 'time_record'
time_record = defaultdict(list)
for length in range(2, unique_items, 10):
unsorted = random_list[:length]
# 'unsorted_set' is used for simple_sort
simple_unsorted = unsorted[:]
unsorted_set = {item for item in simple_unsorted}
print '**********', length, '**********'
print 'simple'
simple = timeit.timeit(lambda: simple_sort(unsorted_set, order), number=times)
time_record['Simple Sort'].append(simple)
print 'quick'
quick_unsorted = unsorted[:]
quick = timeit.timeit(lambda: quick_sort(quick_unsorted), number=times)
time_record['Quick Sort'].append(quick)
print 'merge'
merge_unsorted = unsorted[:]
merged = timeit.timeit(lambda: merge_sort(merge_unsorted), number=times)
time_record['Merge Sort'].append(merged)
print 'shell'
shell_unsorted = unsorted[:]
shell = timeit.timeit(lambda: merge_sort(shell_unsorted), number=times)
time_record['Shell Sort'].append(shell)
print 'bubble'
bubble_unsorted = unsorted[:]
bubble = timeit.timeit(lambda: bubble_sort(bubble_unsorted), number=times)
time_record['In Place Bubble Sort'].append(bubble)
print 'heap'
heap_unsorted = unsorted[:]
heap = timeit.timeit(lambda: heap_sort(heap_unsorted), number=times)
time_record['In Place Heap Sort'].append(heap)
print 'insertion'
insertion_unsorted = unsorted[:]
insertion = timeit.timeit(lambda: insertion_sort(insertion_unsorted), number=times)
time_record['In Place Insertion Sort'].append(insertion)
print 'insertion binary'
insertion_bin_unsorted = unsorted[:]
insertion_bin = timeit.timeit(lambda: insertion_sort_bin(insertion_bin_unsorted), number=times)
time_record['In Place Insertion Sort Binary'].append(insertion_bin)
print 'circle'
circle_unsorted = unsorted[:]
circle = timeit.timeit(lambda: circle_sort(circle_unsorted), number=times)
time_record['In Place Circle Sort'].append(circle)
print 'cocktail'
cocktail_unsorted = unsorted[:]
cocktail = timeit.timeit(lambda: cocktail_sort(cocktail_unsorted), number=times)
time_record['In Place Cocktail Sort'].append(cocktail)
print 'counting'
counting_unsorted = unsorted[:]
counting = timeit.timeit(lambda: counting_sort(counting_unsorted, 0, length), number=times)
time_record['Counting Sort'].append(counting)
print 'cycle'
cycle_unsorted = unsorted[:]
cycle = timeit.timeit(lambda: cycle_sort(cycle_unsorted), number=times)
time_record['In Place Cycle Sort'].append(cycle)
print 'gnome'
gnome_unsorted = unsorted[:]
gnome = timeit.timeit(lambda: gnome_sort(gnome_unsorted), number=times)
time_record['Gnome Sort'].append(gnome)
print 'pancake'
pancake_unsorted = unsorted[:]
pancake = timeit.timeit(lambda: pancake_sort(pancake_unsorted), number=times)
time_record['In Place Pancake Sort'].append(pancake)
print 'patience'
patience_unsorted = unsorted[:]
patience = timeit.timeit(lambda: patience_sort(patience_unsorted), number=times)
time_record['In Place Patience Sort'].append(patience)
print 'radix'
radix_unsorted = unsorted[:]
radix = timeit.timeit(lambda: radix_sort(radix_unsorted, length), number=times)
time_record['Radix Sort'].append(radix)
print 'selection'
selection_unsorted = unsorted[:]
selection = timeit.timeit(lambda: selection_sort(selection_unsorted), number=times)
time_record['Selection Sort'].append(selection)
print 'tree'
tree_unsorted = unsorted[:]
tree_sorted = timeit.timeit(lambda: abstract_tree_sort(tree_unsorted, BinarySearchTree), number=times)
time_record['Abstract Tree Sort'].append(tree_sorted)
print 'tim in c'
tim_unsorted = unsorted[:]
tim = timeit.timeit(lambda: sorted(tim_unsorted), number=times)
time_record['Tim in C'].append(tim)
The best sorting algorithm depends on various factors, including properties of your input (eg size of an element) and requirements to your result (eg stability). For a given input set, Bubblesort may be unusually fast in O(N) while Quicksort may be unusually slow in O(N x N) while Mergesort will always be in O(N x logN).
In the general case, sorting is in O(N x logN), ie there's no algorithm that can sort arbitrary sets faster than this. However, for certain input characteristics, there are sorting algorithms that are linear with respect to the size of the input set. Obviously, you can't get any faster than this.
If you don't know much about sorting, your best bet might be to simply compare some common sorting algorithms. Since your input consists of "unique integers", you don't have to care about whether your sorting algorithm is stable or not.
Try the following algorithms on actual data and choose the fastest:
And if the overall number of possible inputs is "small", you may even be able to skip sorting and just pre-compute all possible results.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.