简体   繁体   中英

Creating a numpy.ndarray with elements consisting of subclassed numpy.ndarray's

I am trying to create a numpy array of subclassed numpy arrays. Unfortunately, when I create my new array of subclasses, numpy automatically upcasts the elements of my array to numpy.ndarray .

The code below shows what I am trying to do. dummy_class inherits from numpy.ndarray and contains some extra functionality(which is not important for the problem at hand). I create two new arrays using the dummy_class constructor and want to put each of these subclassed arrays in a new numpy_ndarray . When the problematic array gets initialized, the type of the subclassed arrays gets automatically upcast from dummy_class to numpy.ndarray . Some code to reproduce the problem can be found below

import numpy

class dummy_class(numpy.ndarray):
    def __new__(cls, data, some_attribute):
        obj = numpy.asarray(data).view(cls)
        obj.attribute = some_attribute
        return obj

array_1 = dummy_class([1,2,3,4], "first dummy")
print type(array_1)
# <class '__main__.dummy_class'>

array_2 = dummy_class([1,2,3,4], "second dummy")
print type(array_2)
# <class '__main__.dummy_class'>

the_problem = numpy.array([array_1, array_2])
print type(the_problem)
# <type 'numpy.ndarray'>
print type(the_problem[0])
# <type 'numpy.ndarray'>
print type(the_problem[1])
# <type 'numpy.ndarray'>

This is how you can fill a NumPy array with arbitrary Python objects:

the_problem = np.empty(2, dtype='O')
the_problem[:] = [array_1, array_2]

I agree with iluengo that making a NumPy array of arrays is not taking advantage of NumPy's strengths because doing so requires the outer NumPy array to be of dtype object . Object arrays require about the same amount of memory as a regular Python list, require more time to build than an equivalent Python list, are no faster at computation than an equivalent Python list. Perhaps their only advantage is that they offer the ability to use NumPy array indexing syntax.

Please refer to the official example of the numpy documentation, here .

I think the main ingredient missing above is an implementation of __array_finalize__() .

The example InfoArray() provided in the link correctly works as expected, without the hack of having to specify the dtype of the newly created array as argument:

shape1 = (2,3)
array_1 = InfoArray(shape1)
print type(array_1)
#<class '__main__.InfoArray'>

shape2 = (1,2)
array_2 = dummy_class(shape2)
the_problem = numpy.array([array_1, array_2])
print type(the_problem)
#<type 'numpy.ndarray'>

print type(the_problem[0])
#<class '__main__.InfoArray'>

Moreover, it is useful to subclass a numpy array, and to aggregate many of them into a larger array like the_problem as reported above if the the resulting aggregate is a numpy array that is not of type object .

As an example, say that array_1 and array_2 have the same shape:

shape = (2,3)
array_1 = InfoArray(shape)
array_2 = InfoArray(shape)
the_problem = numpy.array([array_1, array_2])

Now the dtype of the_problem is not an object, and you can efficiently calculate for example the min as the_problem.min() . You can't do this if you use lists of your subclassed numpy arrays.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM