简体   繁体   中英

python write strings of bytes

I compress some data with the lzw module and I save them into a file ('wb' mode). This returns something like this:


For small compressed data lzw's strings are in the above format. When I put bigger strings for compression the lzw's compressed string is splited into lines.

'\\x18\\xc0\\x86#\\x08$\\x0e\\x060\\x82\\xc2`\\x90\\x98l*', '\\xff\\xb6\\xd9\\xe8r4'

As I checked, string contains '\\n' chars so I think I lose information if the new line missing. How can I store the string so that it will be unsplitted and stored into 1 line ?

I have tried this:

for i in s_string:




def mycpsr(x):
    #x = '11010101001010101010010111110101010101001010' # some random bits for lzw input
    temp = lzw.compress(x)
    temp = "".join(temp)   
    return temp

>>> import lzw
>>> print mycpsr('10101010011111111111111111111111100000000000111111')

If I put bigger input lets say x is a sting of 0 and 1 and len(x) = 1000 and I take the compressed data and append it to a file I get multiple lines instead of 1 line.

If the file has this data:

'\t' + normal strings + '\n'
<LZW-strings(with \t\n chars)>
'\t' + normal strings + '\n'

How can i define which is lzw and which is other data ?

You are dealing with binary data. If your data contains more than 256 bytes you have a good probability that some of the bytes correspond to the ascii code of '\\n'. This will result in a binary file which contains more than one line if considered a text file.

This is not a problem as long as you deal with binary files as sequence of bytes not as a sequence of lines.

So, your binary data contains newlines, and you want to embed it into a line-oriented document. To do that, you need to quote newlines in the binary data. One way to do it, which will quote not only newlines, but other non-printable characters, is by using base64 encoding:

import base64, lzw

def my_compress(x):
    # returns a single line, one trailing \n included
    return base64.encodestring("".join(lzw.compress(x)))

def my_decompress(line):
    return lzw.decompress(base64.decodestring(line))

If your code handles binary characters other than newline, you can make the encoding more space-efficient by only replacing newline with r"\\n" (backslash followed by n ), and backslash with r"\\\\" (two backslash characters). This will allow lzw data to reside in a single binary line, and you will need to just do the inverse transformation before calling lzw.decompress .

>>> txt = """Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum
 ante velit, adipiscing eget sodales non, faucibus vitae nunc. Praesent ac lorem
 cursus, aliquet magna sed, porta diam. Nunc lorem sapien, euismod in congue non
, tincidunt sit amet arcu. Lorem ipsum dolor sit amet, consectetur adipiscing el
it. Phasellus eleifend bibendum massa, ac convallis tellus sodales in. Suspendis
se non aliquam massa. Aenean erat ipsum, sagittis vitae elementum sit amet, iacu
lis sit amet quam. Vivamus luctus hendrerit libero at fringilla. Nullam id urna
est. Vestibulum pretium et tellus et dictum.
... Fusce nulla velit, lobortis at ligula eget, fermentum condimentum felis. Mae
cenas pretium posuere elit in posuere. Suspendisse gravida erat tristique, venen
atis erat at, sagittis elit. Donec laoreet lacinia nunc, eu consequat tortor. Cr
as at sem scelerisque, tristique dolor a, porta mauris. Fusce fermentum massa vi
tae arcu sagittis, et laoreet lacus suscipit. Vestibulum sed accumsan quam. Vest
ibulum eu egestas nisl. Curabitur dolor massa, auctor tempus dui ut, volutpat vu
lputate massa. Fusce vitae tortor adipiscing, gravida est at, molestie tortor. A
enean quis magna magna. Donec cursus enim ac egestas cursus. Pellentesque pulvin
ar nibh in sapien sollicitudin, eget tempus tortor pulvinar. Phasellus dignissim
, urna a sagittis tempor, nulla nulla rhoncus enim, vel molestie nisl lectus qui
s erat. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum sit
amet malesuada nisi, sit amet placerat sem."""
>>> print "".join(lzw.decompress(lzw.compress(txt)))

appears to correctly re decode it including the \\n

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM