简体   繁体   English

Numpy 内存错误

[英]Numpy MemoryError

I am stacking a huge amount of rasters to calculate a median for satellite data of 2 month, this worked fine when the data had a 10m resolution.我正在堆叠大量栅格来计算 2 个月卫星数据的中位数,当数据具有 10m 分辨率时,这很好用。

Since I am running the same functions on the 20m resolution data (-> raster should be 1/2 of the cols and rows) I get memory errors.由于我在 20m 分辨率数据上运行相同的函数(-> 光栅应该是列和行的 1/2),因此出现内存错误。

I did not change anything in between but the Bands of the initial satellite data.除了初始卫星数据的波段之外,我没有改变任何东西。

I know that there is a huge amount of data as this is both a long time and a lot big spatial extent, but still it worked for smaller resolutions.我知道有大量的数据,因为这是一个很长的时间和很大的空间范围,但它仍然适用于较小的分辨率。

I am working on a virtual machine with python3.6 in Anaconda, the machine has 128 GB RAM and 16 VCPus.我正在 Anaconda 中使用 python3.6 开发虚拟机,该机器有 128 GB RAM 和 16 VCPus。

The error messages:错误信息:

<class 'numpy.core._exceptions._ArrayMemoryError'>, ((563, 256, 55296), dtype('int64')) -> always
<class 'MemoryError'>, ((506, 256, 54528), dtype('bool')) -> sometimes

Below is the merging-code, where file_list is the link:下面是合并代码,其中 file_list 是链接:

import os

from typing import List
from osgeo import gdal
import numpy as np
import glob


def build_vrt(vrt: str, files: List[str], resample_name: str) -> None:
    """builds .vrt file which will hold information needed for overlay
    Args:
        vrt (:obj:`string`): name of vrt file, which will be created
        files (:obj:`list`): list of file names for merging
        resample_name (:obj:`string`): name of resampling method
    """

    options = gdal.BuildVRTOptions(srcNodata=-9999)
    gdal.BuildVRT(destName=vrt, srcDSOrSrcDSTab=files, options=options)
    add_pixel_fn(vrt, resample_name)


def add_pixel_fn(filename: str, resample_name: str) -> None:
    """inserts pixel-function into vrt file named 'filename'
    Args:
        filename (:obj:`string`): name of file, into which the function will be inserted
        resample_name (:obj:`string`): name of resampling method
    """

    header = """  <VRTRasterBand dataType="uInt16" band="1" subClass="VRTDerivedRasterBand">"""
    contents = """
    <PixelFunctionType>{0}</PixelFunctionType>
    <PixelFunctionLanguage>Python</PixelFunctionLanguage>
    <PixelFunctionCode><![CDATA[{1}]]>
    </PixelFunctionCode>"""

    lines = open(filename, 'r').readlines()
    lines[3] = header  # FIX ME: 3 is a hand constant
    lines.insert(4, contents.format(resample_name,
                                    get_resample(resample_name)))
    open(filename, 'w').write("".join(lines))


def get_resample(name: str) -> str:
    """retrieves code for resampling method
    Args:
        name (:obj:`string`): name of resampling method
    Returns:
        method :obj:`string`: code of resample method
    """

    methods = {
        "median":
        """
import numpy as np
    def median(in_ar, out_ar, xoff, yoff, xsize, ysize, raster_xsize,raster_ysize, buf_radius, gt, **kwargs):
        div = np.zeros((len(in_ar),in_ar[0].shape[0],in_ar[0].shape[1]), dtype=np.float16)
        for i in range(len(in_ar)):
            div[i,:,:] = np.where(in_ar[i] != 0,in_ar[i],np.nan)

        y = np.nanmedian(div, axis=0)

        np.clip(y,y.min(),y.max(), out = out_ar)
"""}

if name not in methods:
    raise ValueError(
        "ERROR: Unrecognized resampling method (see documentation): '{}'.".
        format(name))

return methods[name]


def merge(files: List[str], output_file: str, resample: str = "average") -> None:
    """merges list of files using specific resample method for overlapping parts
    Args:
        files (:obj:`list[string]`): list of files to merge
        output_file (:obj:`string`): name of output file
        resample (:obj:`string`): name of resampling method
    """
    #des=r"E:\naser\code_de\output\test\_vrt.vrt"
    des=os.getcwd() + "/_vrt.vrt"
    print("1")
    build_vrt(des, files, resample)
    print("2")
    gdal.SetConfigOption('GDAL_VRT_ENABLE_PYTHON', 'YES')
    print("3")
    translateoptions = gdal.TranslateOptions(gdal.ParseCommandLine("-of Gtiff -ot UINT16 -co TILED=YES -co COMPRESS=LZW BIGTIFF=YES NUM_THREADS=ALL_CPUS -a_nodata 0"))
    gdal.SetConfigOption("GDAL_CACHEMAX","512")
    gdal.Translate(destName=output_file, srcDS=des, options=translateoptions)
    print("4")
    gdal.SetConfigOption('GDAL_VRT_ENABLE_PYTHON', None)
    print("5")
    if os.path.isfile(des):
        os.remove(des)



def mergeAll(file_list,Outname,resample):
    merge(file_list,Outname,resample)

Is there a reasonable explanation as to why this is happening the way it happens?是否有合理的解释来解释为什么会发生这种情况? Or what I can do?或者我能做什么?

It looks like you are surpassing the available amount of memory in your virtual machine.看起来您正在超出虚拟机中的可用内存量。

A workaround might be trying to increase it .一种解决方法可能是尝试增加它 Or redesign your program to work in chunks (batch processing), so you access to your data in batches instead of loading all the files into memory.或者重新设计您的程序以分块工作(批处理),这样您就可以批量访问您的数据,而不是将所有文件加载到内存中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM