简体   繁体   English

Numpy:将数值插入数组的最快方法,使数组按顺序排列

[英]Numpy: Fastest way to insert value into array such that array's in order

Suppose I have an array my_array and a singular value my_val . 假设我有一个数组my_array和一个奇异值my_val (Note that my_array is always sorted). (请注意, my_array始终排序)。

my_array = np.array([1, 2, 3, 4, 5])
my_val = 1.5

Because my_val is 1.5, I want to put it in between 1 and 2, giving me the array [1, 1.5, 2, 3, 4, 5] . 因为my_val是1.5,我想把它放在1和2之间,给我数组[1, 1.5, 2, 3, 4, 5] my_val [1, 1.5, 2, 3, 4, 5]

My question is: What's the fastest way (ie in microseconds) of producing the ordered output array as my_array grows arbitrarily large? 我的问题是:当my_array任意增大时,生成有序输出数组的最快方式(即以微秒为单位)是什么?

The original way I though of was concatenating the value to the original array and then sorting: 我原来的方式是将值连接到原始数组然后排序:

arr_out = np.sort(np.concatenate((my_array, np.array([my_val]))))
[ 1.   1.5  2.   3.   4.   5. ]

I know that np.concatenate is fast but I'm unsure how np.sort would scale as my_array grows, even given that my_array will always be sorted. 我知道np.concatenate很快但我不确定np.sort如何随着my_array增长而扩展,即使my_array总是会被排序。

Edit: 编辑:

I've compiled the times for the various methods listed at the time an answer was accepted: 我已经为接受答案时列出的各种方法编制了时间:

Input: 输入:

import timeit

timeit_setup = 'import numpy as np\n' \
               'my_array = np.array([i for i in range(1000)], dtype=np.float64)\n' \
               'my_val = 1.5'
num_trials = 1000

my_time = timeit.timeit(
    'np.sort(np.concatenate((my_array, np.array([my_val]))))',
    setup=timeit_setup, number=num_trials
)

pauls_time = timeit.timeit(
    'idx = my_array.searchsorted(my_val)\n'
    'np.concatenate((my_array[:idx], [my_val], my_array[idx:]))',
    setup=timeit_setup, number=num_trials
)

sanchit_time = timeit.timeit(
    'np.insert(my_array, my_array.searchsorted(my_val), my_val)',
    setup=timeit_setup, number=num_trials
)

print('Times for 1000 repetitions for array of length 1000:')
print("My method took {}s".format(my_time))
print("Paul Panzer's method took {}s".format(pauls_time))
print("Sanchit Anand's method took {}s".format(sanchit_time))

Output: 输出:

Times for 1000 repetitions for array of length 1000:
My method took 0.017865657746239747s
Paul Panzer's method took 0.005813951002013821s
Sanchit Anand's method took 0.014003945532323987s

And the same for 100 repetitions for an array of length 1,000,000: 对于长度为1,000,000的数组,重复100次:

Times for 100 repetitions for array of length 1000000:
My method took 3.1770704101754195s
Paul Panzer's method took 0.3931240139911161s
Sanchit Anand's method took 0.40981490723551417s

Use np.searchsorted to find the insertion point in logarithmic time: 使用np.searchsorted以对数时间查找插入点:

>>> idx = my_array.searchsorted(my_val)
>>> np.concatenate((my_array[:idx], [my_val], my_array[idx:]))
array([1. , 1.5, 2. , 3. , 4. , 5. ])

Note 1: I recommend looking at @Willem Van Onselm's and @hpaulj's insightful comments. 注1:我建议查看@Willem Van Onselm和@ hpaulj的深刻见解。

Note 2: Using np.insert as suggested by @Sanchit Anand may be slightly more convenient if all datatypes are matching from the beginning. 注意2:如果所有数据类型从头开始匹配,则使用np.insert Anand建议的np.insert可能会稍微方便一些。 It is, however, worth mentioning that this convenience comes at the cost of significant overhead: 然而,值得一提的是,这种便利是以巨大的开销为代价的:

>>> def f_pp(my_array, my_val):
...      idx = my_array.searchsorted(my_val)
...      return np.concatenate((my_array[:idx], [my_val], my_array[idx:]))
... 
>>> def f_sa(my_array, my_val):
...      return np.insert(my_array, my_array.searchsorted(my_val), my_val)
...
>>> my_farray = my_array.astype(float)
>>> from timeit import repeat
>>> kwds = dict(globals=globals(), number=100000)
>>> repeat('f_sa(my_farray, my_val)', **kwds)
[1.2453778409981169, 1.2268288589984877, 1.2298014000116382]
>>> repeat('f_pp(my_array, my_val)', **kwds)
[0.2728819379990455, 0.2697303680033656, 0.2688361559994519]

try 尝试

my_array = np.insert(my_array,my_array.searchsorted(my_val),my_val)

[EDIT] make sure that the array is of type float32 or float64, or add a decimal point to any of the list elements while initializing it. [编辑]确保数组的类型为float32或float64,或者在初始化时为任何列表元素添加小数点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM