简体   繁体   中英

Fastest way to swap bytes of big file in Python

For a project I need to swap 4 byte words in a fast way. I need to switch every word(4Bytes) of a big file(2mb) before I can use a other calculation algorithm.

def word_swaper(data):
    buf_swaped_data = b""

    number_of_words = int(len(data) / 4)

    for word in range(number_of_words):
        newword = data[word*4:(word+1)*4]
        newword = newword[::-1]
        buf_swaped_data += newword

Is there a faster or more simpler way? I'm going to use this for files with a size about 2mb and so the calculating time is about 1-2 minutes, which is way to long.

Using two io.BytesIO() s benchmarks to be more than 3x as fast on my boxbut there's a built-in method for this that's 550 times faster...

import timeit
import os
import io
import array


def original(data):
    buf_swaped_data = b""

    number_of_words = int(len(data) / 4)

    for word in range(number_of_words):
        newword = data[word * 4 : (word + 1) * 4]
        newword = newword[::-1]
        buf_swaped_data += newword
    return buf_swaped_data


def io_pair(data):
    in_io = io.BytesIO(data)
    out_io = io.BytesIO()
    while True:
        word = in_io.read(4)
        if not word:
            break
        out_io.write(word[::-1])
    return out_io.getvalue()


def array_swap(data):
    arr = array.array("L", data)
    arr.byteswap()
    return bytes(arr)


def t(f):
    data = b"1234" * 8000
    assert f(data) == original(data)
    count, time_taken = timeit.Timer(lambda: f(data)).autorange()
    print(f.__name__, count / time_taken)


t(original)
t(io_pair)
t(array_swap)
original      186.465
io_pair       568.180
array_swap 102897.423

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM