用Python排序的最快方法（没有cython）

Question

I have a problem where I've to sort a very big array(shape - 7900000X4X4) with a custom function. 我有一个问题，我要用自定义函数对一个非常大的数组（形状 - 7900000X4X4）进行排序。 I used sorted but it took more than 1 hour to sort. 我使用了sorted但sorted花了1个多小时。 My code was something like this. 我的代码是这样的。

def compare(x,y):
    print('DD '+str(x[0]))
    if(np.array_equal(x[1],y[1])==True):
        return -1
    a = x[1].flatten()
    b = y[1].flatten()
    idx = np.where( (a>b) != (a<b) )[0][0]
    if a[idx]<0 and b[idx]>=0:
        return 0
    elif b[idx]<0 and a[idx]>=0:
        return 1
    elif a[idx]<0 and b[idx]<0:
        if a[idx]>b[idx]:
            return 0
        elif a[idx]<b[idx]:
            return 1
    elif a[idx]<b[idx]:
        return 1
    else:
        return 0
def cmp_to_key(mycmp):
    class K:
        def __init__(self, obj, *args):
            self.obj = obj
        def __lt__(self, other):
            return mycmp(self.obj, other.obj)
    return K
tblocks = sorted(tblocks.items(),key=cmp_to_key(compare))

This worked but I want it to complete in seconds. 这有效，但我希望它能在几秒钟内完成。 I don't think any direct implementation in python can give me the performance I need, so I tried cython. 我认为在python中没有任何直接实现可以给我我需要的性能，所以我尝试了cython。 My Cython code is this, which is pretty simple. 我的Cython代码就是这个，非常简单。

cdef int[:,:] arrr
cdef int size

cdef bool compare(int a,int b):
    global arrr,size
    cdef int[:] x = arrr[a]
    cdef int[:] y = arrr[b]
    cdef int i,j
    i = 0
    j = 0
    while(i<size):
        if((j==size-1)or(y[j]<x[i])):
            return 0
        elif(x[i]<y[j]):
            return 1
        i+=1
        j+=1
    return (j!=size-1)

def sorted(np.ndarray boxes,int total_blocks,int s):
    global arrr,size
    cdef int i
    cdef vector[int] index = xrange(total_blocks)
    arrr = boxes
    size = s
    sort(index.begin(),index.end(),compare)
    return index

This code in cython took 33 seconds! cython中的这段代码用了33秒！ Cython is the solution, but I am looking for some alternate solutions which can run directly on python. Cython是解决方案，但我正在寻找一些可以直接在python上运行的替代解决方案。 For example numba. 例如numba。 I tried Numba, but I didn't get satisfying results. 我尝试了Numba，但我没有得到令人满意的结果。 Kindly help! 请帮忙！

Answer 1

It is hard to give an answer without a working example. 没有一个有效的例子，很难给出答案。 I assume, that arrr in your Cython code was a 2D-array and I assume that size was size=arrr.shape[0] 我假设你的Cython代码中的arrr是一个2D数组，我假设这个大小是size=arrr.shape[0]

Numba Implementation Numba实施

import numpy as np
import numba as nb
from numba.targets import quicksort


def custom_sorting(compare_fkt):
  index_arange=np.arange(size)

  quicksort_func=quicksort.make_jit_quicksort(lt=compare_fkt,is_argsort=False)
  jit_sort_func=nb.njit(quicksort_func.run_quicksort)
  index=jit_sort_func(index_arange)

  return index

def compare(a,b):
    x = arrr[a]
    y = arrr[b]
    i = 0
    j = 0
    while(i<size):
        if((j==size-1)or(y[j]<x[i])):
            return False
        elif(x[i]<y[j]):
            return True
        i+=1
        j+=1
    return (j!=size-1)


arrr=np.random.randint(-9,10,(7900000,8))
size=arrr.shape[0]

index=custom_sorting(compare)

This gives 3.85s for the generated testdata. 这为生成的测试数据提供了3.85秒 。 But the speed of a sorting algorithm heavily depends on the data.... 但排序算法的速度在很大程度上取决于数据....

Simple Example 简单的例子

import numpy as np
import numba as nb
from numba.targets import quicksort

#simple reverse sort
def compare(a,b):
  return a > b

#create some test data
arrr=np.array(np.random.rand(7900000)*10000,dtype=np.int32)
#we can pass the comparison function
quicksort_func=quicksort.make_jit_quicksort(lt=compare,is_argsort=True)
#compile the sorting function
jit_sort_func=nb.njit(quicksort_func.run_quicksort)
#get the result
ind_sorted=jit_sort_func(arrr)

This implementation is about 35% slower than np.argsort, but this is also common in using np.argsort in compiled code. 此实现比np.argsort慢约35％，但这在编译代码中使用np.argsort时也很常见。

Answer 2

If I understand your code correctly then the order you have in mind is the standard order, only that it starts at 0 wraps around at +/-infinity and maxes out at -0 . 如果我正确地理解了你的代码，那么你所考虑的顺序就是标准顺序，只是从0开始在+/-infinity ，最大值在-0 。 On top of that we have simple left-to-right lexicographic order. 最重要的是，我们有简单的从左到右的词典顺序。

Now, if your array dtype is integer, observe the following: Because of complement representation of negatives view-casting to unsigned int makes your order the standard order. 现在，如果您的数组dtype是整数，请观察以下内容：由于负数的补码表示，视图转换为unsigned int使您的订单成为标准订单。 On top of that, if we use big endian encoding, efficient lexicographic ordering can be achieved by view-casting to void dtype. 最重要的是，如果我们使用大端编码，可以通过视图转换为void dtype来实现有效的词典排序。

The code below shows that using a 10000x4x4 example that this method gives the same result as your Python code. 下面的代码显示使用10000x4x4示例，此方法提供与Python代码相同的结果。

It also benchmarks it on a 7,900,000x4x4 example (using array, not dict). 它还在7,900,000x4x4示例（使用数组，而不是dict）上对其进行基准测试。 On my modest laptop this method takes 8 seconds. 在我适度的笔记本电脑上，此方法需要8秒钟。

import numpy as np

def compare(x, y):
#    print('DD '+str(x[0]))
    if(np.array_equal(x[1],y[1])==True):
        return -1
    a = x[1].flatten()
    b = y[1].flatten()
    idx = np.where( (a>b) != (a<b) )[0][0]
    if a[idx]<0 and b[idx]>=0:
        return 0
    elif b[idx]<0 and a[idx]>=0:
        return 1
    elif a[idx]<0 and b[idx]<0:
        if a[idx]>b[idx]:
            return 0
        elif a[idx]<b[idx]:
            return 1
    elif a[idx]<b[idx]:
        return 1
    else:
        return 0
def cmp_to_key(mycmp):
    class K:
        def __init__(self, obj, *args):
            self.obj = obj
        def __lt__(self, other):
            return mycmp(self.obj, other.obj)
    return K

def custom_sort(a):
    assert a.dtype==np.int64
    b = a.astype('>i8', copy=False)
    return b.view(f'V{a.dtype.itemsize * a.shape[1]}').ravel().argsort()

tblocks = np.random.randint(-9,10, (10000, 4, 4))
tblocks = dict(enumerate(tblocks))

tblocks_s = sorted(tblocks.items(),key=cmp_to_key(compare))

tblocksa = np.array(list(tblocks.values()))
tblocksa = tblocksa.reshape(tblocksa.shape[0], -1)
order = custom_sort(tblocksa)
tblocks_s2 = list(tblocks.items())
tblocks_s2 = [tblocks_s2[o] for o in order]

print(tblocks_s == tblocks_s2)

from timeit import timeit

data = np.random.randint(-9_999, 10_000, (7_900_000, 4, 4))

print(timeit(lambda: data[custom_sort(data.reshape(data.shape[0], -1))],
             number=5) / 5)

Sample output: 样本输出：

True
7.8328493310138585

用Python排序的最快方法（没有cython）

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-05-07 12:58:08

解决方案2
0 2018-05-07 15:48:09

用Python排序的最快方法（没有cython）

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-05-07 12:58:08

解决方案2 0 2018-05-07 15:48:09

解决方案1
1 已采纳 2018-05-07 12:58:08

解决方案2
0 2018-05-07 15:48:09