简体   繁体   English

如何有效地写入包含混合标签和图像数据的二进制文件

[英]How to efficiently write a binary file containing mixed label and image data

The cifar10 tutorial deals with binary files as input. cifar10教程将二进制文件作为输入处理。 Each record/example on these CIFAR10 datafiles contain mixed label (first element) and image data information. 这些CIFAR10数据文件上的每个记录/示例均包含混合标签(第一个元素)和图像数据信息。 The first answer in this page shows how to write binary file from a numpy array (which accumulates the label and image data information in each row) using ndarray.tofile() as follows: 本页的第一个答案显示了如何使用ndarray.tofile()从numpy数组(在每一行中累积标签和图像数据信息)写入二进制文件,如下所示:

import numpy as np
images_and_labels_array = np.array([[...], ...], dtype=np.uint8)
images_and_labels_array.tofile("/tmp/images.bin")

This is perfect for me when the maximum number of classes is 256 as the uint8 datatype is sufficient. 当最大类数为256时,这对我来说是完美的,因为uint8数据类型已足够。 However, when the maximum number of classes is more than 256, then I have to change the dtype=np.uint16 in the images_and_labels_array. 但是,当最大类数大于256时,则必须更改images_and_labels_array中的dtype = np.uint16。 The consequence is just doubling the size. 结果只是大小增加了一倍。 I would like to know if there is an efficient way to overcome it. 我想知道是否有一种有效的方法来克服它。 If yes, please provide an example. 如果是,请提供示例。

When I write binary files I usually just use the python module struct , which works somehow like this: 当我写二进制文件时,我通常只使用python模块struct ,它的工作方式如下:

import struct
import numpy as np

image = np.zeros([2, 300, 300], dtype=np.uint8)
label = np.zeros([2, 1], dtype=np.uint16)

with open('data.bin', 'w') as fo:
    s = image.shape
    for k in range(s[0]):
        # write label as uint16
        fo.write(struct.pack('H', label[k, 0]))

        # write image as uint8
        for i in range(s[1]):
            for j in range(s[2]):
                fo.write(struct.pack('B', image[k, i, j]))

This should result in a 300*300*2 + 2*1*2 = 180004 bytes big binary file. 这将导致300 * 300 * 2 + 2 * 1 * 2 = 180004字节的大二进制文件。 Its probably not the fastest way to get the job done, but for me it worked sufficiently fast so far. 它可能不是完成工作的最快方法,但对我而言,到目前为止,它的运行速度足够快。 For other datatypes see the documentation 有关其他数据类型,请参见文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM