简体   繁体   中英

Saving and loading bits/bytes in Python

I've been studying compression algorithms recently, and I'm trying to understand how I can store integers as bits in Python to save space.

So first I save '1' and '0' as strings in Python.

import os
import numpy as np

array= np.random.randint(0, 2, size = 200)
string = [str(i) for i in array]
with open('testing_int.txt', 'w') as f:
    for i in string:
        f.write(i)

print(os.path.getsize('testing_int.txt'))

I get back 200 bytes which makes sense, since each each char is represented by one byte in ascii (and utf-8 as well if characters are latin?).

Now if trying to save these ones and zeroes as bits, I should only take up around 25 bytes right?

200 bits/8 = 25 bytes .

However, when I try the following code below, I get 105 bytes . Am I doing something wrong?

Using the same 'array variable' as above I tried this:

bytes_string = [bytes(i) for i in array]
with open('testing_bytes.txt', 'wb') as f:
    for i in bytes_string:
        f.write(i)

Then I tried this:

bin_string = [bin(i) for i in array]
with open('testing_bin.txt', 'wb') as f:
    for i in bytes_string:
        f.write(i)

This also takes up around 105 bytes .

So I tried looking at the text files, and I noticed that both the 'bytes.txt' and 'bin.txt' are blank.

So I tried to read the 'bytes.txt' file via this code:

with open(r"C:\Users\Moondra\Desktop\testing_bytes\testing_bytes.txt", 'rb') as f:
    x =f.read()

Now I get get back as this :

b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

So I tried these commands:

>>> int.from_bytes(x, byteorder='big')
0
>>> int.from_bytes(x, byteorder='little')
0
>>> 

So apparently I'm doing multiple things incorrectly. I can't figure out:

1) Why I am not getting a text file that is 25 bytes 2) Why can I read back the bytes file correctly.

Thank you.

bytes_string = [bytes(i) for i in array]

It looks like you expect bytes(x) to give you a one-byte bytes object with the value of x . Follow the documentation , and you'll see that bytes() is initialized like bytearray() , and bytearray() says this about its argument:

If it is an integer, the array will have that size and will be initialized with null bytes.

So bytes(0) gives you an empty bytes object, and bytes(1) gives you a single byte with the ordinal zero. That's why bytes_string is about half the size of array and is made up completely of zero bytes.

As for why the bin() example didn't work, it looks like a simple case of copy-pasting and forgetting to change bytes_string to bin_string in the for loop.

This all still doesn't accomplish your goal of treating 0 or 1 value integers as bits. Python doesn't really have that sort of functionality built in. There are third-party modules that allow you to work at the bit level, but I can't speak to any of them specifically. Personally I would probably just roll my own specific to the application.

It looks like you're trying to bit shift all the values into a single byte. For example, you expect the integer values [0,1,0,1,0,1,0,1] to be packed into a byte that looks like the following binary number: 0b01010101 . To do this, you need to use the bitwise shift operator and bitwise or operator along with the struct module to pack the values into an unsigned Char which represents the sequence of int values you have.

The code below takes the array of random integers in range [0,1] and shifts them together to make a binary number that can be packed into a single byte. I used 256 ints for convenience. The expected number of bytes for the file to be is then 32 (256/8). You will see that when it is run this is indeed what you get.

import struct
import numpy as np
import os

a = np.random.randint(0, 2, size = 256)
bool_data = []

bin_vals = []
for i in range(0, len(a), 8):
    bin_val = (a[i] << 0) | (a[i+1] << 1) | \
    (a[i+2] << 2) | (a[i+3] << 3) | \ 
    (a[i+4] << 4) | (a[i+5] << 5) | \
    (a[i+6] << 6) | (a[i+7] << 7)
    bin_vals.append(struct.pack('B', bin_val))

with open("output.txt", 'wb') as f:
    for val in bin_vals:
        f.write(val)

print(os.path.getsize('output.txt'))

Please note, however, that this will only work for values of integers in the range [0,1] since if they are bigger it will shift more non-zeros and wreck the structure of the generated byte. The binary number may also exceed 1 byte in size in this case.

It seems like you're just using python in attempt to generate an array of bits for demonstration purposes, and to that token I would say that python probably isn't best suited for this. I would recommend using a lower level language such as C/C++ which has more direct access to data type than python does.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM