简体   繁体   English

提取 python 中的 3D 图像补丁时出现大的分配错误

[英]large alloc error when extracting 3D Image patches in python

I'm trying to extract small 3D patches (example patch size 20x20x4) from a 3D Image of size 250x250x250 with stride 1 for every axis.我正在尝试从大小为 250x250x250 的 3D 图像中提取小的 3D 补丁(示例补丁大小为 20x20x4),每个轴的步幅为 1。 I'll be extracting all possible patches as I'll be running a function on each patch and returning the result in the form of a 3D image with the result of the current patch assigned to the center voxel of the patch.我将提取所有可能的补丁,因为我将在每个补丁上运行 function 并以 3D 图像的形式返回结果,并将当前补丁的结果分配给补丁的中心体素。 For extracting the patches I'll be using the code below:为了提取补丁,我将使用以下代码:

import numpy as np
from numpy.lib import stride_tricks

def cutup(data, blck, strd):
    sh = np.array(data.shape)
    blck = np.asanyarray(blck)
    strd = np.asanyarray(strd)
    nbl = (sh - blck) // strd + 1
    strides = np.r_[data.strides * strd, data.strides]
    dims = np.r_[nbl, blck]
    data6 = stride_tricks.as_strided(data, strides=strides, shape=dims)
    return data6.reshape(-1, *blck)

#demo
x = np.zeros((250,250,250), int)
y = cutup(x, (20, 20, 4), (1, 1, 1))

I'm running this on google colab which has around 12gigs of ram.我在 google colab 上运行它,它有大约 12gigs 的 ram。 Since the result is large number of patches, I'm getting a large alloc error and then the kernel restarts.由于结果是大量补丁,我得到一个大的分配错误,然后 kernel 重新启动。 I think splitting the image in to parts would work, but If I do so how should I write the code in order for it to consider the neighbouring voxels?我认为将图像分成几部分是可行的,但如果我这样做,我应该如何编写代码以便考虑相邻的体素? Is there a smart way to do this?有没有聪明的方法来做到这一点?

Don't reshape the newly strided array/view before returning.在返回之前不要重塑新跨步的数组/视图。

def cutup(data, blck, strd):
    sh = np.array(data.shape)
    blck = np.asanyarray(blck)
    strd = np.asanyarray(strd)
    nbl = (sh - blck) // strd + 1
    strides = np.r_[data.strides * strd, data.strides]
    dims = np.r_[nbl, blck]
    data6 = stride_tricks.as_strided(data, strides=strides, shape=dims)
    return data6

Then iteratate over the patches.然后迭代补丁。

p = np.zeros((250,250,250), int)
q = cutup(p, (20, 20, 4), (1, 1, 1))
print(f'windowed shape : {q.shape}')
print()
for i,x in enumerate(q):
    print(f'x.shape:{x.shape}')
    for j,y in enumerate(x):
        print(f'\ty.shape:{y.shape}')
        for k,z in enumerate(y):
            print(f'\t\tz.shape:{z.shape}')
            if k==5: break
        break
    break
>>>
windowed shape : (231, 231, 247, 20, 20, 4)

x.shape:(231, 247, 20, 20, 4)
        y.shape:(247, 20, 20, 4)
                z.shape:(20, 20, 4)
                z.shape:(20, 20, 4)
                z.shape:(20, 20, 4)
                z.shape:(20, 20, 4)
                z.shape:(20, 20, 4)
                z.shape:(20, 20, 4)

Your example will produce an array (or a view of the array) with a shape of (231,231, 247, 20, 20, 4) or thirteen million+ 3-d patches.您的示例将生成形状为(231,231, 247, 20, 20, 4)或 1300 万+ 3-d 补丁的数组(或数组视图)。

That will solve your memory allocation problem.这将解决您的 memory 分配问题。


when I try to reshape it to (231,231,247,-1).当我尝试将其重塑为 (231,231,247,-1) 时。 I get large alloc error我得到很大的分配错误

If your operation requires the last three dimensions to be flattened, do that in your iteration.如果您的操作需要展平最后三个维度,请在迭代中执行此操作。

for i,x in enumerate(q):
    for j,y in enumerate(x):
        for k,z in enumerate(y):
            z = z.reshape(-1)
            print(f'\t\tz.shape:{z.shape}')
            if k==5: break
        break
    break

Looks like you can do that reshape in the outermost loop - at least for a zeros array.看起来您可以在最外层循环中进行重塑 - 至少对于 zeros 数组。

for i,x in enumerate(q):
    zero,one,*last = x.shape
    x = x.reshape(zero,one,-1)
    print(f'x.shape:{x.shape}')
    for j,y in enumerate(x):
        print(f'\ty.shape:{y.shape}')
        for k,z in enumerate(y):
            print(f'\t\tz.shape:{z.shape}')
            break
        break
    break
>>>
x.shape:(231, 247, 1600)
        y.shape:(247, 1600)
                z.shape:(1600,)

Is there a smart way to do this?有没有聪明的方法来做到这一点?

If you can figure out how to vectorize your operation so that you only need to iterate over the first dimension or the first and second dimensions you can speed up your processing.如果您能弄清楚如何对您的操作进行矢量化,以便您只需要遍历第一个维度或第一和第二个维度,您就可以加快处理速度。 That should be a separate question if you encounter problems.如果您遇到问题,那应该是一个单独的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM