简体   繁体   中英

python numpy ndarray subclassing for offset changing

I am working on a framework for processing incoming data.

The data is received from a socket and added to numpy an array A (used as buffer) using shifting, sth like:

A[:-1] = A[1:]
A[-1] = value

The framework allows loading processing units as classes that have an access to incoming data using array view pointing to A. Everytime new data is received and stored in A, a method execute() is called:

def execute(self,):
    newSample = self.data[-1]

What is important is that new sample is always under index = -1 . A user can also create his own array views in __init__ function:

def __init__(self,):
    self.myData = self.data[-4:]  # view that contains last 4 samples

Everything works nicely when I am shifting array A and adding new value at the end. However, for offline testing, I want to load all the data at the start of the framework and run everything else as before (ie the same classes implementing data processing). Of course, I can again create A buffer using zeros array and shift it with new values. However, this involves copying of data between two arrays that is absolutely not necessary - takes time and memory.

What I was thinking about is to provide a way to change the boundaries of the numpy array or change A.data pointer. However, all the solutions are not allowed or lead to the warning message.

Finally, I am trying to change an internal offset of array A, so that I can advance it and thus make more data available for algorithms. What is important, self.data[-1] has to always point to the newly appeared sample and standard numpy array API should be used.

I have subclassed np.ndarray:

class MyArrayView(np.ndarray):
    def __new__(cls, input_array):
        obj = np.asarray(input_array).view(cls)
        # add the new attribute to the created instance
        obj._offset = 0
        # Finally, we must return the newly created object:
        return obj

    def __array_finalize__(self, obj):
        if obj is None:
            return
        self._offset = getattr(obj, '_offset', None)

    def advance_index(self):
        self._offset += 1

    def __str__(self):
        return super(MyArrayView, self[:]).__str__()

    def __repr__(self):
        return super(MyArrayView, self[:]).__repr__()

    def __getitem__(self, idx):
        if isinstance(idx, slice):
            start = 0
            stop = self._offset
            step = idx.step
            idx = slice(start, stop, step)
        else:
            idx = self._offset + idx
        return super(MyArrayView, self).__getitem__(idx)

that allows me to do the following:

a = np.array([1,2,3,4,5,6,7,8,9,10])
myA = MyArrayView(a)
b = myA
print("b :", b)
for i in range(1,5):
    myA.advance_index()
    print(b[:], b[-1])

print("b :", b)
print("b + 10 :", b + 10)
print("b[:] + 20 :", b[:] + 20)

and gives following output:

b : []
[1] 1
[1 2] 2
[1 2 3] 3
[1 2 3 4] 4
b : [1 2 3 4]
b + 10 : [11 12 13 14]
b[:] + 20 : [21 22 23 24]

so far so good. However if I check the shape:

print("shape", b[:].shape)  # shape (4,)
print("shape", b.shape)     # shape (10,)

it is different in those two cases. I have tried to change it using: shape=(self.internalIndex,) but it leads me only to an error message.

I want to ask if you think this is the right way what I am doing and it only requires to overload more functions in a np.ndarray class. Or should I completely abandon this solution and fallback to shifting array with a new sample? Or is it may be possible to be achieved using standard np.ndarray implementation as I need to use standard numpy API.

I also tried this:

a = np.array([1,2,3,4,5,6,7,8,9,10])
b = a.view()[5:]

print(a.data)  # <memory at 0x7f09e01d8f48>
print(b.data)  # <memory at 0x7f09e01d8f48> They point to the same memory start!

print(np.byte_bounds(a)) # (50237824, 50237904)
print(np.byte_bounds(b)) # (50237864, 50237904) but the byte_bounds are different

So having this in mind, I would say I need to create a view of array a and extend it (or at least move it like a window on top of a ). However, all my tries to change the byte_bounds did not bring any effects.

I admire your bravery, but am quite sure sub-classing numpy arrays is overkill for your problem and can cause you a huge lot of headache. In the end it might cause a performance hit somewhere that by far outruns the array copying you are trying to avoid.

Why not make the slice (ie [-4:] or slice(-4, None) ) a parameter to your __init__ function or a class attribute and override that in your test?

def __init__(self, lastfour=slice(-4, None)):
    self.myData = self.data[lastfour]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM