简体   繁体   English

如何在 ndarray 上创建一个滑动 window,并环绕?

[英]How to create a sliding window over an ndarray, with wrap-around?

I'm trying to write some logic to return an array shifted one step to the right, with wrap around.我正在尝试编写一些逻辑来返回一个向右移动一步的数组,并环绕。 I was relying on receiving an IndexError to implement the wrap-around, but no error is thrown!我依靠接收 IndexError 来实现环绕,但没有抛出错误!

def get_batches(arr, batch_size, seq_length):
    """
    Return arr data as batches of shape (batch_size, seq_length)
    """
    
    n_chars = batch_size * seq_length
    n_batches = int(np.floor(len(arr)/ n_chars))
    n_keep = n_chars * n_batches
    
    arr = arr[:n_keep].reshape(batch_size, -1)
    
    for b in range(n_batches):
        start = b * seq_length
        stop = start + seq_length
        
        x = arr[:, start:stop]
        try: 
            y = arr[:, start + 1: stop + 1]
        except IndexError:
            y = np.concatenate(x[:, 1:], arr[:, 0], axis=1)
        
        yield x, y

So this code works great, except when the last y array is yielded... I get a (2,2) array instead of the expected (2,3) .所以这段代码工作得很好,除非最后一个y数组产生......我得到一个(2,2)数组而不是预期的(2,3) That's because an IndexError is never thrown.那是因为 IndexError 永远不会被抛出。

test = np.arange(12)
batches = get_batches(test, 2, 3)

for x, y in batches:
    print('x=', x)
    print('y=', y, '\n')

yields产量

x=
 [[0 1 2]
 [6 7 8]]
y=           # as expected
 [[1 2 3]
 [7 8 9]] 

x=
 [[ 3  4  5]
 [ 9 10 11]]
y=           # truncated :(
 [[ 4  5]
 [10 11]] 

Does anyone have an alternative suggestion about how to get this done?有没有人对如何完成这项工作有其他建议? Preferably something as simple as my failed solution?最好像我失败的解决方案一样简单?

Try this:尝试这个:

from skimage.util.shape import view_as_windows
def get_batches2(arr, batch_size, seq_length):
    """
    Return arr data as batches of shape (batch_size, seq_length)
    """
    n_chars = batch_size * seq_length
    n_batches = int(np.floor(len(arr)/ n_chars))
    n_keep = n_chars * n_batches
    
    arr = arr[:n_keep].reshape(batch_size, -1)
    x = view_as_windows(arr, (batch_size, seq_length), seq_length)[0]
    y = view_as_windows(np.roll(arr,-1,axis=1), (batch_size, seq_length), seq_length)[0]

    return x, y

view_as_windows uses the same shared memory (It is a view. You can check to see if they share same memory). view_as_windows使用相同的共享 memory (这是一个视图。您可以检查它们是否共享相同的内存)。 So it would not matter if you yield it with loops or return it.所以不管你是用循环产生它还是返回它都没有关系。 It will not use extra memory if that is the issue (specially that your windows are not overlapping), and it should be way faster than generator.如果这是问题,它将不会使用额外的 memory(特别是您的 windows 不重叠),并且它应该比生成器快得多。 You probably can even achieve this by simple reshaping without view_as_windows too.您甚至可以通过不view_as_windows的简单重塑来实现这一点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM