Numpy：将数值插入数组的最快方法，使数组按顺序排列

Question

Suppose I have an array my_array and a singular value my_val . 假设我有一个数组my_array和一个奇异值my_val 。 (Note that my_array is always sorted). （请注意， my_array始终排序）。

my_array = np.array([1, 2, 3, 4, 5])
my_val = 1.5

Because my_val is 1.5, I want to put it in between 1 and 2, giving me the array [1, 1.5, 2, 3, 4, 5] . 因为my_val是1.5，我想把它放在1和2之间，给我数组[1, 1.5, 2, 3, 4, 5] my_val [1, 1.5, 2, 3, 4, 5] 。

My question is: What's the fastest way (ie in microseconds) of producing the ordered output array as my_array grows arbitrarily large? 我的问题是：当my_array任意增大时，生成有序输出数组的最快方式（即以微秒为单位）是什么？

The original way I though of was concatenating the value to the original array and then sorting: 我原来的方式是将值连接到原始数组然后排序：

arr_out = np.sort(np.concatenate((my_array, np.array([my_val]))))
[ 1.   1.5  2.   3.   4.   5. ]

I know that np.concatenate is fast but I'm unsure how np.sort would scale as my_array grows, even given that my_array will always be sorted. 我知道np.concatenate很快但我不确定np.sort如何随着my_array增长而扩展，即使my_array总是会被排序。

Edit: 编辑：

I've compiled the times for the various methods listed at the time an answer was accepted: 我已经为接受答案时列出的各种方法编制了时间：

Input: 输入：

import timeit

timeit_setup = 'import numpy as np\n' \
               'my_array = np.array([i for i in range(1000)], dtype=np.float64)\n' \
               'my_val = 1.5'
num_trials = 1000

my_time = timeit.timeit(
    'np.sort(np.concatenate((my_array, np.array([my_val]))))',
    setup=timeit_setup, number=num_trials
)

pauls_time = timeit.timeit(
    'idx = my_array.searchsorted(my_val)\n'
    'np.concatenate((my_array[:idx], [my_val], my_array[idx:]))',
    setup=timeit_setup, number=num_trials
)

sanchit_time = timeit.timeit(
    'np.insert(my_array, my_array.searchsorted(my_val), my_val)',
    setup=timeit_setup, number=num_trials
)

print('Times for 1000 repetitions for array of length 1000:')
print("My method took {}s".format(my_time))
print("Paul Panzer's method took {}s".format(pauls_time))
print("Sanchit Anand's method took {}s".format(sanchit_time))

Output: 输出：

Times for 1000 repetitions for array of length 1000:
My method took 0.017865657746239747s
Paul Panzer's method took 0.005813951002013821s
Sanchit Anand's method took 0.014003945532323987s

And the same for 100 repetitions for an array of length 1,000,000: 对于长度为1,000,000的数组，重复100次：

Times for 100 repetitions for array of length 1000000:
My method took 3.1770704101754195s
Paul Panzer's method took 0.3931240139911161s
Sanchit Anand's method took 0.40981490723551417s

Answer 1

Use np.searchsorted to find the insertion point in logarithmic time: 使用np.searchsorted以对数时间查找插入点：

>>> idx = my_array.searchsorted(my_val)
>>> np.concatenate((my_array[:idx], [my_val], my_array[idx:]))
array([1. , 1.5, 2. , 3. , 4. , 5. ])

Note 1: I recommend looking at @Willem Van Onselm's and @hpaulj's insightful comments. 注1：我建议查看@Willem Van Onselm和@ hpaulj的深刻见解。

Note 2: Using np.insert as suggested by @Sanchit Anand may be slightly more convenient if all datatypes are matching from the beginning. 注意2：如果所有数据类型从头开始匹配，则使用np.insert Anand建议的np.insert可能会稍微方便一些。 It is, however, worth mentioning that this convenience comes at the cost of significant overhead: 然而，值得一提的是，这种便利是以巨大的开销为代价的：

>>> def f_pp(my_array, my_val):
...      idx = my_array.searchsorted(my_val)
...      return np.concatenate((my_array[:idx], [my_val], my_array[idx:]))
... 
>>> def f_sa(my_array, my_val):
...      return np.insert(my_array, my_array.searchsorted(my_val), my_val)
...
>>> my_farray = my_array.astype(float)
>>> from timeit import repeat
>>> kwds = dict(globals=globals(), number=100000)
>>> repeat('f_sa(my_farray, my_val)', **kwds)
[1.2453778409981169, 1.2268288589984877, 1.2298014000116382]
>>> repeat('f_pp(my_array, my_val)', **kwds)
[0.2728819379990455, 0.2697303680033656, 0.2688361559994519]

Answer 2

try 尝试

my_array = np.insert(my_array,my_array.searchsorted(my_val),my_val)

[EDIT] make sure that the array is of type float32 or float64, or add a decimal point to any of the list elements while initializing it. [编辑]确保数组的类型为float32或float64，或者在初始化时为任何列表元素添加小数点。

Numpy：将数值插入数组的最快方法，使数组按顺序排列

问题描述

2 个解决方案

解决方案1
3 已采纳 2018-02-13 22:49:33

解决方案2
3 2018-02-13 22:49:50

Numpy：将数值插入数组的最快方法，使数组按顺序排列

问题描述

2 个解决方案

解决方案1 3 已采纳 2018-02-13 22:49:33

解决方案2 3 2018-02-13 22:49:50

解决方案1
3 已采纳 2018-02-13 22:49:33

解决方案2
3 2018-02-13 22:49:50