简体   繁体   English

是什么原因使用ndarray而不是python数组

[英]what is a reason to use ndarray instead of python array

I build a class with some iteration over coming data. 我构建了一个类,对未来的数据进行了一些迭代。 The data are in an array form without use of numpy objects. 数据采用数组形式,不使用numpy对象。 On my code I often use .append to create another array. 在我的代码中,我经常使用.append来创建另一个数组。 At some point I changed one of the big array 1000x2000 to numpy.array. 在某些时候,我将一个大数组1000x2000更改为numpy.array。 Now I have an error after error. 现在我在出错后出错了。 I started to convert all of the arrays into ndarray but comments like .append does not work any more. 我开始将所有数组转换为ndarray,但像.append这样的注释不再起作用了。 I start to have a problems with pointing to rows, columns or cells. 我开始遇到指向行,列或单元格的问题。 and have to rebuild all code. 并且必须重建所有代码。

I try to google an answer to the question: "what is and advantage of using ndarray over normal array" I can't find a sensible answer. 我试着回答一个问题的答案:“使用ndarray超过正常数组的优点和优势”我找不到合理的答案。 Can you write when should I start to use ndarrays and if in your practice do you use both of them or stick to one only. 你能写什么时候我应该开始使用ndarrays吗?如果在你的练习中你使用它们或只坚持一个。

Sorry if the question is a novice level, but I am new to python, just try to move from Matlab and want to understand what are pros and cons. 很抱歉,如果问题是新手级别,但我是python的新手,只是尝试从Matlab移动并想了解什么是利弊。 Thanks 谢谢

NumPy and Python arrays share the property of being efficiently stored in memory. NumPy和Python数组共享高效存储在内存中的属性。

NumPy arrays can be added together, multiplied by a number, you can calculate, say, the sine of all their values in one function call, etc. As HYRY pointed out, they can also have more than one dimension. NumPy数组可以加在一起,乘以一个数字,你可以在一个函数调用中计算所有值的正弦值,等等。正如HYRY指出的那样,它们也可以有多个维度。 You cannot do this with Python arrays. 你不能用Python数组做到这一点。

On the other hand, Python arrays can indeed be appended to. 另一方面,Python数组确实可以附加到。 Note that NumPy arrays can however be concatenated together ( hstack() , vstack() ,…). 请注意,NumPy数组可以连接在一起( hstack()vstack() ,...)。 That said, NumPy arrays are mostly meant to have a fixed number of elements. 也就是说,NumPy数组主要是为了拥有固定数量的元素。

It is common to first build a list (or a Python array) of values iteratively and then convert it to a NumPy array (with numpy.array() , or, more efficiently, with numpy.frombuffer() , as HYRY mentioned): this allows mathematical operations on arrays (or matrices) to be performed very conveniently (simple syntax for complex operations). 通常首先迭代地构建一个列表(或Python数组),然后将其转换为NumPy数组(使用numpy.array() ,或者更有效地使用numpy.frombuffer() ,如HYRY所提到的):这允许非常方便地执行对数组(或矩阵)的数学运算(复杂运算的简单语法)。 Alternatively, numpy.fromiter() might be used to construct the array from an iterator. 或者,可以使用numpy.fromiter()从迭代器构造数组。 Or loadtxt() to construct it from a text file. 或者loadtxt()从文本文件构造它。

There are at least two main reasons for using NumPy arrays: 使用NumPy阵列至少有两个主要原因:

  • NumPy arrays require less space than Python lists. NumPy数组比Python列表需要更少的空间。 So you can deal with more data in a NumPy array (in-memory) than you can with Python lists. 因此,您可以在NumPy数组(内存中)处理比使用Python列表更多的数据。
  • NumPy arrays have a vast library of functions and methods unavailable to Python lists or Python arrays. NumPy数组拥有庞大的Python列表或Python数组不可用的函数和方法库。

Yes, you can not simply convert lists to NumPy arrays and expect your code to continue to work. 是的,您不能简单地将列表转换为NumPy数组,并期望您的代码继续工作。 The methods are different, the bool semantics are different. 方法不同,bool语义不同。 For the best performance, even the algorithm may need to change. 为了获得最佳性能,即使是算法也可能需要更改。

However, if you are looking for a Python replacement for Matlab, you will definitely find uses for NumPy. 但是,如果您正在寻找Matlab的Python替代品,您肯定会找到NumPy的用途。 It is worth learning. 值得学习。

array.array can change size dynamically. array.array可以动态更改大小。 If you are collecting data from some source, it's better to use array.array . 如果要从某些源收集数据,最好使用array.array But array.array is only one dimension, and there is no calculation functions to do with it. 但是array.array只是一个维度,并且没有与它有关的计算函数。 So, when you want to do some calculation with your data, convert it to numpy.ndarray , and use functions in numpy. 因此,当您想对数据进行一些计算时,将其转换为numpy.ndarray ,并使用numpy中的函数。

numpy.frombuffer can create a numpy.ndarray that shares the same data buffer with array.array objects, it's fast because it don't need to copy the data. numpy.frombuffer可以创建一个numpy.ndarray与共享相同的数据缓冲array.array对象,它的速度快,因为它不需要复制数据。

Here is a demo: 这是一个演示:

import numpy as np
import array
import random

a = array.array("d")
# a for loop that collects 10 channels data
for x in range(100):
    a.extend([random.random() for _ in xrange(10)])

# create a ndarray that share the same data buffer with array a, and reshape it to 2D
na = np.frombuffer(a, dtype=float).reshape(-1, 10)

# call some numpy function to do the calculation
np.sum(na, axis=0)

Another great advantage of using NumPy arrays over built-in lists is the fact that NumPy has a C API that allows native C and C++ code to access NumPy arrays directly. 使用NumPy数组而不是内置列表的另一个巨大优势是NumPy有一个C API,允许本机C和C ++代码直接访问NumPy数组。 Hence, many Python libraries written in low-level languages like C are expecting you to work with NumPy arrays instead of Python lists. 因此,许多用C语言等低级语言编写的Python库希望您使用NumPy数组而不是Python列表。

Reference: Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython 参考:用于数据分析的Python:与Pandas,NumPy和IPython进行数据争夺

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM