简体   繁体   English

正确的方法来测试numpy.dtype

[英]Correct way to test for numpy.dtype

I'm looking at a third-party lib that has the following if -test: 我正在查看具有以下if -test的第三方库:

if isinstance(xx_, numpy.ndarray) and xx_.dtype is numpy.float64 and xx_.flags.contiguous:
    xx_[:] = ctypes.cast(xx_.ctypes._as_parameter_,ctypes.POINTER(ctypes.c_double))

It appears that xx_.dtype is numpy.float64 always fails: 似乎xx_.dtype is numpy.float64总是失败:

>>> xx_ = numpy.zeros(8, dtype=numpy.float64)
>>> xx_.dtype is numpy.float64

False

What is the correct way to test that the dtype of a numpy array is float64 ? 什么是测试的正确方式dtype一numpy的阵列是float64

This is a bug in the lib. 这是lib中的一个错误。

dtype objects can be constructed dynamically. dtype对象可以动态构造。 And NumPy does so all the time. NumPy一直这样做。 There's no guarantee anywhere that they're interned, so constructing a dtype that already exists will give you the same one. 不能保证他们被实习的任何地方,所以构建一个已经存在的dtype将给你相同的一个。

On top of that, np.float64 isn't actually a dtype ; 最重要的是, np.float64实际上不是一个dtype ; it's a… I don't know what these types are called, but the types used to construct scalar objects out of array bytes, which are usually found in the type attribute of a dtype , so I'm going to call it a dtype.type . 它是...我不知道这些类型被调用了什么,但是用于构造标量对象的类型是数组字节,通常在dtypetype属性中找到,因此我将其称为dtype.type (Note that np.float64 subclasses both NumPy's numeric tower types and Python's numeric tower ABCs, while np.dtype of course doesn't.) (注意, np.float64是NumPy的数字塔类型和Python的数字塔ABCs的子类,而np.dtype当然不是。)

Normally, you can use these interchangeably; 通常,您可以互换使用这些; when you use a dtype.type —or, for that matter, a native Python numeric type—where a dtype was expected, a dtype is constructed on the fly (which, again, is not guaranteed to be interned), but of course that doesn't mean they're identical: 当你使用dtype.type -or,就此而言,一个原生的Python数字类型 - 其中一个dtype是预期的,一个dtype是动态构建的(再次,不保证被实现),但当然,并不意味着它们是相同的:

>>> np.float64 == np.dtype(np.float64) == np.dtype('float64') 
True
>>> np.float64 == np.dtype(np.float64).type
True

The dtype.type usually will be identical if you're using builtin types: dtype.type如果你使用内建类型通常是相同的:

>>> np.float64 is np.dtype(np.float64).type
True

But two dtype s are often not: 但是两个dtype通常不是:

>>> np.dtype(np.float64) is np.dtype('float64')
False

But again, none of that is guaranteed. 但同样,这一切都没有得到保证。 (Also, note that np.float64 and float use the exact same storage, but are separate types. And of course you can also make a dtype('f8') , which is guaranteed to work the same as dtype(np.float64) , but that doesn't mean 'f8' is , or even == , np.float64 .) (另请注意, np.float64float使用完全相同的存储,但它们是不同的类型。当然你也可以制作一个dtype('f8') ,它保证与dtype(np.float64)相同,但这并不意味着'f8' is ,甚至==np.float64 。)

So, it's possible that constructing an array by explicitly passing np.float64 as its dtype argument will mean you get back the same instance when you check the dtype.type attribute, but that isn't guaranteed. 因此,它可能是通过明确地传递构建阵列np.float64dtype的说法意味着,当你检查你回来相同的实例dtype.type属性,但不能保证。 And if you pass np.dtype('float64') , or you ask NumPy to infer it from the data, or you pass a dtype string for it to parse like 'f8' , etc., it's even less likely to match. 如果你传递np.dtype('float64') ,或者你要求NumPy从数据中推断它,或者你传递一个dtype字符串来解析它像'f8'等,它甚至不太可能匹配。 More importantly, you definitely not get np.float64 back as the dtype itself. 更重要的是,你肯定不会得到np.float64回来的dtype本身。


So, how should it be fixed? 那么,应该如何解决?

Well, the docs define what it means for two dtype s to be equal , and that's a useful thing, and I think it's probably the useful thing you're looking for here. 好吧,文档定义了两个dtype s 相等的含义,这是一个有用的东西,我认为这可能是你在这里寻找的有用的东西。 So, just replace the is with == : 所以,只需用==替换is

if isinstance(xx_, numpy.ndarray) and xx_.dtype == numpy.float64 and xx_.flags.contiguous:

However, to some extent I'm only guessing that's what you're looking for. 但是,在某种程度上,我只是猜测你正在寻找的东西。 (The fact that it's checking the contiguous flag implies that it's probably going to go right into the internal storage… but then why isn't it checking C vs. Fortran order, or byte order, or anything else?) (事实上​​,它正在检查连续的标志意味着它可能会直接进入内部存储...但是为什么不检查C与Fortran顺序,或字节顺序,还是其他任何东西?)

Try: 尝试:

x = np.zeros(8, dtype=np.float64)
print x.dtype is np.dtype(np.float64))    

is tests for the identity of 2 objects, whether they have the same id() . is测试2个对象的身份,是否具有相同的id() It is used for example to test is None , but can give errors when testing for integers or strings. 例如,它用于测试is None ,但在测试整数或字符串时可能会出错。 But in this case, there's a further problem, x.dtype and np.float64 are not the same class. 但在这种情况下,还有一个问题, x.dtypenp.float64不是同一个类。

isinstance(x.dtype, np.dtype)  # True
isinstance(np.float64, np.dtype) # False


x.dtype.__class__  # numpy.dtype
np.float64.__class__ # type

np.float64 is actually a function. np.float64实际上是一个函数。 np.float64() produces 0.0 . np.float64()生成0.0 x.dtype() produces an error. x.dtype()产生错误。 (correction np.float64 is a class.) (更正np.float64是一个类。)

In my interactive tests: 在我的交互式测试中:

x.dtype is np.dtype(np.float64)

returns True . 返回True But I don't know if that's universally the case, or just the result of some sort of local caching. 但我不知道这是普遍的情况,还是仅仅是某种本地缓存的结果。 The dtype documentation mentions a dtype attribute: dtype文档提到了一个dtype属性:

dtype.num A unique number for each of the 21 different built-in types. dtype.num 21种不同内置类型中每种类型的唯一编号。

Both dtypes give 12 for this num . 两个dtypes为这个num提供12

x.dtype == np.float64

tests True . 测试是True

Also, using type works: 另外,使用type作品:

x.dtype.type is np.float64  # True

When I import ctypes and do the cast (with your xx_ ) I get an error: 当我输入ctypes ,做cast (与你的xx_ )我得到一个错误:

ValueError: setting an array element with a sequence. ValueError:使用序列设置数组元素。

I don't know enough of ctypes to understand what it is trying to do. 我不太了解ctypes以了解它正在尝试做什么。 It looks like it is doing a type conversion of the data pointer of xx_ , xx_.ctypes._as_parameter_ is the same number as xx_.__array_interface__['data'][0] . 看起来它正在对xx_data指针进行类型转换, xx_.ctypes._as_parameter_xx_.__array_interface__['data'][0]


In the numpy test code I find these dtype tests: numpy测试代码中,我找到了这些dtype测试:

issubclass(arr.dtype.type, (nt.integer, nt.bool_)
assert_(dat.dtype.type is np.float64)
assert_equal(A.dtype.type, np.unicode_)
assert_equal(r['col1'].dtype.kind, 'i')

numpy documentation also talks about numpy文档也谈到了

np.issubdtype(x.dtype, np.float64)
np.issubsctype(x, np.float64)

both of which use issubclass . 两者都使用issubclass


Further tracing of the c code suggests that x.dtype == np.float64 is evaluated as: 进一步跟踪c代码表明x.dtype == np.float64被评估为:

x.dtype.num == np.dtype(np.float64).num

That is, the scalar type is converted to a dtype , and the .num attributes compared. 也就是说,标量类型转换为一个dtype ,和.num属性进行比较。 The code is in scalarapi.c , descriptor.c , multiarraymodule.c of numpy / core / src / multiarray 代码位于numpy / core / src / multiarray scalarapi.cdescriptor.cmultiarraymodule.c

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM