简体   繁体   English

如何逐个补丁地写入png / tiff文件?

[英]How can I write to a png/tiff file patch-by-patch?

I want to create a png or tiff image file from a very large h5py dataset that cannot be loaded into memory all at once. 我想从一个非常大的h5py数据集创建一个png或tiff图像文件,该数据集不能一次加载到内存中。 So, I was wondering if there is a way in python to write to a png or tiff file in patches? 所以,我想知道在python中是否有办法在补丁中写入png或tiff文件? (I can load the h5py dataset in slices to a numpy.ndarray ). (我可以将切片中的h5py数据集加载到numpy.ndarray )。 I've tried using the pillow library and doing PIL.Image.paste giving the box coordinates, but for large images it goes out of memory. 我已经尝试使用枕头库并进行PIL.Image.paste给出盒子坐标,但对于大图像,它会耗尽内存。

Basically, I'm wondering if there's a way to do something like: 基本上,我想知道是否有办法做一些事情:

for y in range(0, height, patch_size):
    for x in range(0, width, patch_size):
        y2 = min(y + patch_size, height)
        x2 = min(x + patch_size, width)
        # image_arr is an h5py dataset that cannot be loaded completely
        # in memory, so load it in slices
        image_file.write(image_arr[y:y2, x:x2], box=(y, x, y2, x2))

I'm looking for a way to do this, without having the whole image loaded into memory. 我正在寻找一种方法来做到这一点,而不是将整个图像加载到内存中。 I've tried the pillow library, but it loads/keeps all the data in memory. 我已经尝试过枕头库,但它会将所有数据加载/保存在内存中。

Edit: This question is not about h5py, but rather how extremely large images (that cannot be loaded into memory) can we written out to a file in patches - similar to how large text files can be constructed by writing to it line by line. 编辑:这个问题不是关于h5py,而是关于如何将非常大的图像(无法加载到内存中)写入补丁中的文件 - 类似于通过逐行写入来构建大文本文件的方式。

Short answer to "if there is a way in Python to write to a png or tiff file in patches?". 简短回答“如果在Python中有一种方法可以在补丁中写入png或tiff文件吗?”。 Well, yes - everything is possible in Python, given enough time and skill to implement it. 嗯,是的 - 只要有足够的时间和技巧来实现它,Python中的一切都是可能的。 On the other hand, NO, there is no ready-made solution for this - because it doesn't appear to be very useful. 另一方面,不,没有现成的解决方案 - 因为它看起来不是很有用。

I don't know about TIFF and a comment here says it is limited to 4GB, so this format is likely not a good candidate. 我不知道TIFF,这里的评论说它限制在4GB,所以这种格式可能不是一个好的候选人。 PNG has no practical limit and can be written in chunks, so it is doable in theory - on the condition that at least one scan line of your resulting image does fit into memory. PNG没有实际限制, 可以用块编写,因此理论上是可行的 - 条件是所得图像的至少一条扫描线确实适合存储器。

If you really want to go ahead with this, here is the info that you need: A PNG file consists of a few metadata chunks and a series of image data chunks. 如果您真的想继续这样做,这里是您需要的信息:PNG文件由一些元数据块和一系列图像数据块组成。 The latter are independent of each other and you can therefore construct a big image out of several smaller images (each of which contains a whole number of rows, a minimum of one row) by simply concatenating their image data chunks (IDAT) together and adding the needed metadata chunks (you can pick those from the first small image, except for the IHDR chunk - that one will need to be constructed to contain the final image size). 后者是相互独立的,因此你可以通过简单地将它们的图像数据块(IDAT)连接在一起并添加几个较小的图像(每个图像包含一整行,最少一行)来构建一个大图像。所需的元数据块(您可以从第一个小图像中选择那些,除了IHDR块 - 需要构造一个以包含最终图像大小)。

So, here is how I'd do it, if I had to (NOTE you will need some understanding of Python's bytes type and the methods of converting byte sequences to and from Python data types to pull this off): 所以,这就是我如何做到的,如果必须的话(请注意,您需要了解Python的bytes类型以及将字节序列转换为Python数据类型以及从Python数据类型转换的方法):

  • find how many rows I can fit into memory and make that the height of my "small image chunk". 找到我可以装入内存的行数,并将其作为“小图像块”的高度。 The width is the width of the entire final image. 宽度是整个最终图像的宽度。 let's call those width and small_height 我们称之为widthsmall_height

  • go through my giant data set in h5py one chunk at a time ( width * small_height ), convert it to PNG and save it to disk in a temporary file, or if your image conversion library allows it - directly to a bytes string in memory. 在h5py中一次查看我的巨型数据集( width * small_height ),将其转换为PNG并将其保存到临时文件中的磁盘,或者如果图像转换库允许它 - 直接到内存中的bytes字符串。 Then process the byte data as follows and delete it at the end: 然后按如下方式处理字节数据并在结尾删除它:

    -- on the first iteration: walk through the PNG data one record at a time (see the PNG spec: http://www.libpng.org/pub/png/spec/1.2/png-1.2-pdg.html , it is in length-tag-value form and very easy to write code that efficiently walks over the file record by record), save ALL the records into my target file, except : modify IHDR to have the final image size and skip the IEND record. - 在第一次迭代中:一次遍历PNG数据一条记录(参见PNG规范: http//www.libpng.org/pub/png/spec/1.2/png-1.2-pdg.html ,它是长度标记值形式,很容易编写代码,通过记录有效地遍历文件记录),将所有记录保存到我的目标文件中, 除了 :修改IHDR以获得最终图像大小并跳过IEND记录。

    -- on all subsequent iterations: scan through the PNG data and pick only the IDAT records, write those out to the output file. - 在所有后续迭代中:扫描PNG数据并仅选择IDAT记录,将其写入输出文件。

  • append an IEND record to the target file. 将IEND记录附加到目标文件。

All done - you should now have a valid humongous PNG. 一切都完成了 - 你现在应该拥有一个有效的PNG。 I wonder who or what could read that, though. 不过,我想知道是谁或什么可以读。

Try tifffile.memmap : 试试tifffile.memmap

from tifffile import memmap

image_file = memmap('temp.tif', shape=(height, width), dtype=image_arr.dtype,
                    bigtiff=True)

for y in range(0, height, patch_size):
    for x in range(0, width, patch_size):
        y2 = min(y + patch_size, height)
        x2 = min(x + patch_size, width)
        image_file[y:y2, x:x2] = image_arr[y:y2, x:x2]

image_file.flush()

This creates a uncompressed BigTIFF file with one strip. 这将创建一个带有一个条带的未压缩BigTIFF文件。 Memory-mapped tiles are not implemented yet. 内存映射的磁贴尚未实现。 Not sure how many libraries can handle that kind of file, but you can always directly read from the strip using the meta data in the TIFF tags. 不确定有多少库可以处理这种类型的文件,但您始终可以使用TIFF标记中的元数据直接从条带中读取。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM