简体   繁体   English

直接索引一个 numpy 数组的时间复杂度是多少

[英]What's the time complexity of indexing a numpy array directly

I assume when having a numpy array, let's say我假设当有一个 numpy 数组时,让我们说

>>>>nArray
array([[  23425.     ,  521331.40625],
       [  23465.     ,  521246.03125],
       [  23505.     ,  528602.8125 ],
       [  23545.     ,  531934.75   ],
       [  23585.     ,  534916.375  ],
       [  23865.     ,  527971.1875 ]])

direct indexing must be pretty efficient.直接索引必须非常有效。

I imagine something like that nArray[0, 1] = 69696420 must be using a hash-table which will give a time complexity close to O(1).我想像nArray[0, 1] = 69696420必须使用哈希表,它的时间复杂度接近 O(1)。 Is that right?那正确吗?

update更新

As both answers noted, there is no hashing involved in indexing a numpy array.正如两个答案所指出的,索引 numpy 数组不涉及散列。 Both answers give a clear explanation about how the indexing happens.两个答案都清楚地解释了索引是如何发生的。

update 2更新 2

I added a simple benchmarking to prove the validity of the answers我添加了一个简单的基准测试来证明答案的有效性

On the one hand一方面

must be using a hash-table which will give a time complexity close to O(1).必须使用一个哈希表,它的时间复杂度接近 O(1)。 Is that right?那正确吗?

is not quite true.不完全正确。 Numpy array s are basically contiguous blocks of homogeneous memory , with some extra info on the side on dimensions and such. Numpy array基本上是连续的同构内存块,在维度等方面有一些额外的信息。 Therefore, the access is O(1) , and just involves some trivial math to determine the position within the memory.因此,访问是O(1) ,并且只涉及一些简单的数学运算来确定内存中的位置。

On the other hand另一方面

indexing must be pretty efficient.索引必须非常有效。

is unfortunately not true at all.不幸的是,这根本不是真的。 Asides from bounds checking (which arrays do), everything involving pure python is extremely inefficient (and accesses involve pure-python calls).除了边界检查(数组所做的)之外,涉及纯 python 的一切都非常低效(并且访问涉及纯 python 调用)。 Numpy array access is no exception . Numpy 数组访问也不例外 You should try to use vector operations whenever possible.您应该尽可能尝试使用向量运算。

There is no hash table involved.不涉及哈希表。 Numpy arrays are arrays, just like the name implies, and the address is computed like this: Numpy数组就是数组,顾名思义,地址是这样计算的:

address of nArray[x, y] = base address + A * x + B * y

To add some extra validation through testing to Ami's Answer I made a simple circular buffer from a numpy array that only uses direct indexing to make insertions.为了通过测试对 Ami 的答案添加一些额外的验证,我从一个仅使用直接索引进行插入的 numpy 数组创建了一个简单的循环缓冲区。 Basically each insertion just changes the values of the oldest element in the queue.基本上每次插入只会改变队列中最旧元素的值。

The code is not completely bug free but it can serve as a basis for some simple performance benchmarking.该代码并非完全没有错误,但它可以作为一些简单的性能基准测试的基础。

import math
import numpy as np


class CircFifo():
"""
helper class, uses a numpy array to provide a circular fixed size fifo
interface.

put(element): removes the oldest element and
places a new one

get(): returns the oldest entry

empty(): returns true if fifo is empty

full():  returns true if fifo is full
"""
def __init__(self, size):
    self.array = np.empty(shape=(size, 2))
    self.size = size
    self.array[:] = np.NAN
    self.top = 0
    self.bottom = 0

def put(self, row):
    self.array[self.top, :] = row
    self.top += 1
    if self.top == self.size:
        self.top = 0

def get(self):
    if not math.isnan(self.array[self.bottom, 0]):
        row = copy.deepcopy(self.array[self.bottom, :])
        self.array[self.bottom, :] = float('NaN')
        self.bottom += 1
    if self.bottom == self.size:
        self.bottom = 0
    if math.isnan(self.array[self.bottom, 0]):
        self.bottom = 0
        self.top = 0
    return row

def empty(self):
    if math.isnan(self.array[self.bottom, 0]):
        return True
    else:
        return False

def full(self):
    if self.size - np.count_nonzero(
            np.isnan(self.array[:, 0])) == self.size:
        return True
    else:
        return False

The correctness of the answers in the post seems to be confirmed through a simple test that I run.帖子中答案的正确性似乎通过我运行的一个简单测试得到了确认。 I tested the insertion performance against a deque object.我针对 deque 对象测试了插入性能。 Even for 1000 insertions deque, which also servers as a dynamic and not static data structure (opposed to my static circular fifo), clearly outperforms the circular fifo.即使对于 1000 次插入双端队列,它也用作动态而非静态数据结构(与我的静态循环先入先出相反),也明显优于循环先入先出。

Here is the simple test I run这是我运行的简单测试

In [5]: import time

In [6]: circFifo = CircFifo(300)

In [7]: elapsedTime = 0

In [8]: for i in range(1, 1000):
   ...:         start = time.time()
   ...:         circFifo.put(np.array([[52, 12]]))
   ...:         elapsedTime += time.time() - start
   ...:     

In [9]: elapsedTime
Out[9]: 0.010616540908813477


In [21]: queue = deque()

In [22]: elapsedTime = 0

In [23]: for i in range(1, 1000):
   ....:         start = time.time()
   ....:         queue.append(np.array([[52, 12]]))
   ....:         elapsedTime += time.time() - start
   ....:     

In [24]: elapsedTime
Out[24]: 0.00482630729675293

I know that this benchmark is not that informative but it becomes quite apparent that deque is much faster.我知道这个基准测试的信息量并不大,但很明显 deque 速度要快得多。 For at least that amount of insertions.对于至少插入的数量。

I would expect that if that circular fifo was implemented in C with a static array it could not be easily outperformed.我希望如果在 C 中使用静态数组实现圆形 fifo,它就不会轻易被超越。 Since basically C's static array is the simplest and with less overhead data structure available.因为基本上 C 的静态数组是最简单的,可用的数据结构开销较少。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM