简体   繁体   English

如何有效地将 numpy 数组调整为给定形状,必要时用零填充?

[英]How to efficiently resize a numpy array to a given shape, padding with zeros if necessary?

I want to create an array of a given shape based on another numpy array.我想基于另一个 numpy 数组创建一个给定形状的数组。 The number of dimensions will be matching, but the sizes will differ from axis to axis.尺寸的数量将匹配,但尺寸会因轴而异。 If the original size is too small, I want to pad it with zeros to fulfill the requirements.如果原始尺寸太小,我想用零填充它以满足要求。 Example of expected behaviour to clarify:需要澄清的预期行为示例:

embedding = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8]
])

resize_with_outer_zeros(embedding, (4, 3)) = np.array([
    [1, 2, 3],
    [5, 6, 7],
    [0, 0, 0],
    [0, 0, 0]
])

I think I achieved the desired behaviour with the function below.我想我通过下面的 function 实现了预期的行为。

def resize_with_outer_zeros(embedding: np.ndarray, target_shape: Tuple[int, ...]) -> np.ndarray:
    padding = tuple((0, max(0, target_size - size)) for target_size, size in zip(target_shape, embedding.shape))
    target_slice = tuple(slice(0, target_size) for target_size in target_shape)
    return np.pad(embedding, padding)[target_slice]

However, I have strong doubts about its efficiency and elegance, as it involves a lot of pure python tuple operations.然而,我对它的效率和优雅有强烈的怀疑,因为它涉及到很多纯 python 元组操作。 Is there a better and more concise way to do it?有没有更好更简洁的方法呢?

If you know that your array won't be bigger than some size (r, c) , why not just:如果您知道您的数组不会大于某个大小(r, c) ,为什么不只是:

def pad_with_zeros(A, r, c):
   out = np.zeros((r, c))
   r_, c_ = np.shape(A)
   out[0:r_, 0:c_] = A
   return out

If you want to support arbitrary dimensions (tensors) it gets a little uglier, but the principle remains the same:如果你想支持任意维度(张量),它会变得有点难看,但原理是一样的:

def pad(A, shape):
   out = np.zeros(shape)
   out[tuple(slice(0, d) for d in np.shape(A))] = A
   return out

And to support larger arrays (larger than what you would pad):并支持更大的 arrays (比您要填充的更大):

def pad(A, shape):
    shape = np.max([np.shape(A), shape], axis=0)
    out = np.zeros(shape)
    out[tuple(slice(0, d) for d in np.shape(A))] = A
    return out

I don't think you can do much better, but instead of using pad and then slicing, just do zeros at the right size and then an assignment - this cuts it to one list comprehension instead of two.我不认为你可以做得更好,但不要使用pad然后切片,只需在正确的大小处执行zeros ,然后进行分配 - 这会将其削减为一个列表理解而不是两个。

embedding = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8]
])

z = np.zeros((4,3))
s = tuple([slice(None, min(za,ea)) for za,ea in zip(z.shape, embedding.shape)])

z[s] = embedding[s]
z
# array([[1., 2., 3.],
#        [5., 6., 7.],
#        [0., 0., 0.],
#        [0., 0., 0.]])

I'd just use a zero-matrix and run a nested for-loop to set the values from the older array - the remaining places will automatically be padded with zeros.我只需使用零矩阵并运行嵌套的 for 循环来设置旧数组中的值 - 其余位置将自动填充零。


import numpy as np


def resize_array(array, new_size):
    Z = np.zeros(new_size)
    for i in range(len(Z)):
        for j in range(len(Z[i])):
            try:
                Z[i][j] = array[i][j]
            except IndexError:       # just in case array[i][j] doesn't exist in the new size and should be truncated
                pass
    return Z


embedding = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(resize_array(embedding, (4, 3)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM