简体   繁体   English

使用ndim对ndarray进行Python-numpy测试

[英]Python-numpy test for ndarray using ndim

I'm working on a project in Python requiring a lot of numerical array calculations. 我正在研究一个需要大量数值数组计算的Python项目。 Unfortunately (or fortunately, depending on your POV), I'm very new to Python, but have been doing MATLAB and Octave programming (APL before that) for years. 不幸的是(或者幸运的是,取决于你的POV),我是Python的新手,但多年来一直在做MATLAB和Octave编程(APL之前)。 I'm very used to having every variable automatically typed to a matrix float, and still getting used to checking input types. 我习惯于将每个变量自动输入到矩阵浮点数,并且仍然习惯于检查输入类型。

In many of my functions, I require the input S to be a numpy.ndarray of size (n,p) , so I have to both test that type(S) is numpy.ndarray and get the values (n,p) = numpy.shape(S) . 在我的许多函数中,我要求输入S是numpy.ndarray的大小(n,p) ,所以我必须测试该类型(S)是numpy.ndarray并获取值(n,p) = numpy.shape(S) One potential problem is that the input could be a list/tuple/int/etc..., another problem is that the input could be an array of shape () : S.ndim = 0 . 一个潜在的问题是输入可能是list / tuple / int / etc ......,另一个问题是输入可能是shape ()数组: S.ndim = 0 It occurred to me that I could simultaneously test the variable type, fix the S.ndim = 0 problem, then get my dimensions like this: 在我看来,我可以同时测试变量类型,修复S.ndim = 0问题,然后得到我的维度:

# first simultaneously test for ndarray and get proper dimensions
try:
    if (S.ndim == 0):
        S = S.copy(); S.shape = (1,1);
    # define dimensions p, and p2
    (p,p2) = numpy.shape(S);
except AttributeError:  # got here because input is not something array-like
    raise AttributeError("blah blah blah");

Though it works, I'm wondering if this is a valid thing to do? 虽然它有效,但我想知道这是否有效? The docstring for ndim says ndim的文档字符串说

If it is not already an ndarray, a conversion is attempted. 如果它还不是ndarray,则尝试转换。

and we surely know that numpy can easily convert an int/tuple/list to an array, so I'm confused why an AttributeError is being raised for these types inputs, when numpy should be doing this 我们肯定知道numpy可以轻松地将int / tuple / list转换为数组,所以我很困惑为什么会为这些类型的输入引发AttributeError,当numpy应该这样做

numpy.array(S).ndim;

which should work. 这应该工作。

When doing input validation for NumPy code, I always use np.asarray : 在为NumPy代码进行输入验证时,我总是使用np.asarray

>>> np.asarray(np.array([1,2,3]))
array([1, 2, 3])
>>> np.asarray([1,2,3])
array([1, 2, 3])
>>> np.asarray((1,2,3))
array([1, 2, 3])
>>> np.asarray(1)
array(1)
>>> np.asarray(1).shape
()

This function has the nice feature that it only copies data when necessary; 此功能具有很好的功能,它只在必要时复制数据; if the input is already an ndarray , the data is left in-place (only the type may be changed, because it also gets rid of that pesky np.matrix ). 如果输入已经是ndarray ,则数据保留在原位(只有类型可能会被更改,因为它也摆脱了那个讨厌的np.matrix )。

The docstring for ndim says ndim的文档字符串说

That's the docstring for the function np.ndim , not the ndim attribute, which non-NumPy objects don't have. 这是函数np.ndim的文档字符串,而不是ndim属性,非NumPy对象没有。 You could use that function, but the effect would be that the data might be copied twice, so instead do: 您可以使用该功能,但效果可能是数据可能被复制两次,因此请执行以下操作:

S = np.asarray(S)
(p, p2) = S.shape

This will raise a ValueError if S.ndim != 2 . 如果S.ndim != 2这将引发ValueError

[Final note: you don't need ; [最后说明:你不需要; in Python if you just follow the indentation rules. 在Python中,如果你只是遵循缩进规则。 In fact, Python programmers eschew the semicolon.] 实际上,Python程序员避开了分号。

Given the comments to @larsmans answer, you could try: 鉴于对@larsmans回答的评论,您可以尝试:

if not isinstance(S, np.ndarray):
    raise TypeError("Input not a ndarray")
if S.ndim == 0:
    S = np.reshape(S, (1,1))
(p, p2) = S.shape

First, you check explicitly whether S is a (subclass of) ndarray . 首先,明确检查S是否是ndarray的(子类)。 Then, you use the np.reshape to copy your data (and reshaping it, of course) if needed. 然后,如果需要,可以使用np.reshape复制数据(当然np.reshape重新整形)。 At last, you get the dimension. 最后,你得到了维度。

Note that in most cases, the np functions will first try to access the corresponding method of a ndarray , then attempt to convert the input to a ndarray (sometimes keeping it a subclass, as in np.asanyarray , sometimes not (as in np.asarray(...) ). In other terms, it's always more efficient to use the method rather than the function: that's why we're using S.shape and not np.shape(S) . 请注意,在大多数情况下, np函数将首先尝试访问ndarray的相应方法,然后尝试将输入转换为ndarray (有时将其保留为子类,如np.asanyarray ,有时不会(如np.asarray(...) )。 np.asarray(...) ,使用方法而不是函数总是更有效:这就是我们使用S.shape而不是np.shape(S)

Another point: the np.asarray , np.asanyarray , np.atleast_1D ... are all particular cases of the more generic function np.array . 另一点: np.asarraynp.asanyarraynp.atleast_1D ......都是更通用的函数np.array For example, asarray sets the optional copy argument of array to False , asanyarray does the same and sets subok=True , atleast_1D sets ndmin=1 , atleast_2d sets ndmin=2 ... In other terms, it's always easier to use np.array with the appropriate arguments. 例如, asarrayarray的可选copy参数设置为Falseasanyarray设置相同并设置subok=Trueatleast_1D设置ndmin=1atleast_2d设置ndmin=2 ... np.array ,使用np.array总是更容易用适当的论据。 But as mentioned in some comments, it's a matter of style. 但正如一些评论中提到的那样,这是一种风格问题。 Shortcuts can often improve readability, which is always an objective to keep in mind. 快捷方式通常可以提高可读性,这始终是一个需要牢记的目标。

In any case, when you use np.array(..., copy=True) , you're explicitly asking for a copy of your initial data, a bit like doing a list([....]) . 在任何情况下,当你使用np.array(..., copy=True) ,你明确地要求提供初始数据的副本,有点像做一个list([....]) Even if nothing else changed, your data will be copied. 即使没有其他更改,您的数据也将被复制。 That has the advantages of its drawbacks (as we say in French), you could for example change the order from row-first C to column-first F . 这有它的缺点的优势(如我们在法国的说),例如,你可以改变的order从行第一C到列第一F But anyway, you get the copy you wanted. 但无论如何,你得到你想要的副本。

With np.array(input, copy=False) , a new array is always created. 使用np.array(input, copy=False) ,始终会创建一个新数组。 It will either point to the same block of memory as input if this latter was already a ndarray (that is, no waste of memory), or will create a new one "from scratch" if input wasn't. 如果后者已经是ndarray (也就是说,不浪费内存),它将指向同一块内存作为input ,或者如果input不是,则将“从头开始”创建一个新内存块。 The interesting case is of course if input was a ndarray . 有趣的情况当然是input是一个ndarray

Using this new array in a function may or may not change the original input, depending on the function. 在函数中使用此新数组可能会也可能不会更改原始输入,具体取决于函数。 You have to check the documentation of the function you want to use to see whether it returns a copy or not. 您必须检查要使用的函数的文档,以查看它是否返回副本。 The NumPy developers try hard to limit unnecessary copies (following the Python example), but sometimes it can't be avoided. NumPy开发人员努力限制不必要的副本(遵循Python示例),但有时无法避免。 The documentation should tell explicitly what happens, if it doesn't or it's unclear, please mention it. 文档应该明确说明发生了什么,如果没有或者不清楚,请提及。

np.array(...) may raise some exceptions if something goes awry. 如果出现问题, np.array(...)可能会引发一些异常。 For example, trying to use a dtype=float with an input like ["STRING", 1] will raise a ValueError . 例如,尝试使用带有["STRING", 1]类的输入的dtype=float将引发ValueError However, I must admit I can't remember which exceptions in all the cases, please edit this post accordingly . 但是,我必须承认我不记得在所有情况下哪些例外,请相应地编辑这篇文章

Welcome to stack-overflow. 欢迎来到堆栈溢出。 This comes down to almost a style choice, but the most common way I've seen to deal with this kind of situation is to convert the input to an array. 这归结为几乎一种风格选择,但我看到处理这种情况的最常见方式是将输入转换为数组。 Numpy provides some useful tools for this. Numpy为此提供了一些有用的工具。 numpy.asarray has already been mentioned, but here are a few more. numpy.asarray已被提及,但这里还有一些。 numpy.at_least1d is similar to asarray , but reshapes () arrays to be (1,) numpy.at_least2d is the same as above but reshapes 0d and 1d arrays to be 2d, ie (3,) to (1, 3). numpy.at_least1d类似于asarray ,但asarray ()数组为(1,) numpy.at_least2d与上面相同,但重塑0d和1d数组为2d,即(3,)到(1,3)。 The reason we convert "array_like" inputs to arrays is partly just because we're lazy, for example sometimes it can be easier to write foo([1, 2, 3]) than foo(numpy.array([1, 2, 3])) , but this is also the design choice made within numpy itself. 我们将“array_like”输入转换为数组的部分原因仅仅是因为我们很懒,例如有时写foo([1, 2, 3])foo(numpy.array([1, 2, 3]))更容易foo(numpy.array([1, 2, 3])) ,但这也是numpy本身的设计选择。 Notice that the following works: 请注意以下工作:

>>> numpy.mean([1., 2., 3.])
>>> 2.0

In the docs for numpy.mean we can see that x should be "array_like". numpy.mean的文档中,我们可以看到x应该是“array_like”。

 Parameters ---------- a : array_like Array containing numbers whose mean is desired. If `a` is not an array, a conversion is attempted. 

That being said, there are situations when you want to only accept arrays as arguments and not all "array_like" types. 话虽这么说,有些情况下你只想接受数组作为参数而不是所有“array_like”类型。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM