简体   繁体   English

在Python中按字节读取二进制文件的最快方法

[英]The fastest way to read binary files by bytes in Python

I am making a program, which should be able to encode any type of file using huffman algorithm. 我正在制作一个程序,该程序应该能够使用霍夫曼算法对任何类型的文件进行编码。 It all works, but using it on large files is too slow (at least I think it is). 都可以,但是在大文件上使用它太慢了(至少我认为是这样)。 When I tried to open an 120MB mp4 file to unpack it, it took me about 210s just to read the file. 当我尝试打开120MB的mp4文件进行解压缩时,花了大约210秒钟的时间才读取该文件。 Not to mention that it took a large chunk of memory to do so. 更不用说这样做花费了大量的内存。 I thought unpacking using struct would be efficient, but it isnt. 我认为使用struct进行拆包会很有效,但事实并非如此。 Isn't there more effiecent way to do it in python? 在python中没有更有效的方法吗? I need to read any file by bytes and then pass it to the huffman method in string. 我需要按字节读取任何文件,然后将其以字符串形式传递给huffman方法。

if __name__ == "__main__":
    start = time.time()
    with open('D:\mov.mp4', 'rb') as f:
        dataL = f.read()
    data = np.zeros(len(dataL), 'uint8')

    for i in range(0, len(dataL)):
        data[i] = struct.unpack('B', dataL[i])[0]

    data.tostring()

    end = time.time()
    print("Original file read: ")
    print end - start

    encoded, table = huffman_encode(data)

Your approach is loading a file into a python object -> creating an empty Numpy array then filling the Numpy array bit by bit using a Python iterator. 您的方法是将文件加载到python对象中->创建一个空的Numpy数组,然后使用Python迭代器一点一点地填充Numpy数组。

Lets take out the middlemen: 让我们取出中间商:

if __name__ == "__main__":
    start = time.time()
    data = np.fromfile('d:\mov.mp4', dtype=np.uint8, count=-1)
    end = time.time()
    print("Original file read: ")
    print end - start
    encoded, table = huffman_encode(data)

What to do with 'data' depends on what type of data your huffman_encode(data) will receive. 如何处理“数据”取决于您的huffman_encode(data)将接收哪种数据类型。 I would try to avoid using strings. 我会尽量避免使用字符串。

Documentation on the call is here: http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html 通话文档在这里: http : //docs.scipy.org/doc/numpy/reference/genic/numpy.fromfile.html

  • I would be interested to hear the speed differences in the comments :) 我想听听评论中的速度差异:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM