简体   繁体   English

排序numpy结构化和记录数组非常慢

[英]sorting numpy structured and record arrays is very slow

it looks like sorting numpy structured and record arrays by a single column is much slower than doing a sort on a similar standalone array: 看起来像单个列对numpy结构化和记录数组进行排序比在类似的独立数组上进行排序要慢得多:

In [111]: a = np.random.rand(1e4)

In [112]: b = np.random.rand(1e4)

In [113]: rec = np.rec.fromarrays([a,b])

In [114]: timeit rec.argsort(order='f0')
100 loops, best of 3: 18.8 ms per loop

In [115]: timeit a.argsort()
1000 loops, best of 3: 891 µs per loop

There is a marginal improvement using the structured array, but it's not dramatic: 使用结构化数组有一个微小的改进,但它不是戏剧性的:

In [120]: struct = np.empty(len(a),dtype=[('a','f8'),('b','f8')])

In [121]: struct['a'] = a

In [122]: struct['b'] = b

In [124]: timeit struct.argsort(order='a')
100 loops, best of 3: 15.8 ms per loop

This indicates that it's potentially faster to create an index array from argsort and then use that to reorder the individual arrays. 这表明从argsort创建索引数组然后使用它来重新排序各个数组可能会更快。 This is OK except that I expect to be dealing with very large arrays and would like to avoid copying data as much as possible. 这是可以的,除了我希望处理非常大的数组,并希望尽可能避免复制数据。 Is there a more efficient way of doing this that I'm missing? 有没有一种更有效的方法来做到这一点,我错过了?

What´s slowing you is the use of order , not the fact that you have a record array. 放慢你的是使用order ,而不是你有一个记录数组的事实。 If you want to sort by a single field, do it like this: 如果要按单个字段排序,请执行以下操作:

In [12]: %timeit np.argsort(rec['f0'])
1000 loops, best of 3: 829 us per loop

Once order is used, performance goes south no matter how many fields you want to sort by: 使用order ,无论您要排序多少个字段,性能都会向南:

In [16]: %timeit np.argsort(rec, order=['f0'])
10 loops, best of 3: 27.9 ms per loop

In [17]: %timeit np.argsort(rec, order=['f0', 'f1'])
10 loops, best of 3: 28.4 ms per loop

As Jaime have said, you can use argsort to sort the record array. 正如Jaime所说,你可以使用argsort对记录数组进行排序。

inds = np.argsort(rec['f0'])

And use take to avoid making a copy 并使用take避免复制

np.take(rec, inds, out=rec)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM