简体   繁体   中英

Encoding and decoding string in Python

I want to write a string to a file using Python. I know how to do that, so that's not a problem. I also wish to encode that string once it has been written. The encoding doesn't really matter, so I'll stick to let's say UTF-32. What I do for that is after I wrote the string, I read from the file again, encode the string into bytes and then re-write to the same file. I can do the encoding part, but my problem arises with the decoding. I want to read it as bytes so that I can convert it back to a str . What I do for this is the same principle: Read from file, decode and write to the same file. What I get from reading the encoded string looks like b'\\xff\\xfe\\x00\\x001\\x00\\x00\\x004\\x00\\x00\\x002\\x00\\x00\\x00'

When I read this as bytes, it doubles the b and the backslashes. If I read it like this, as a string, and then try to decode, it keeps saying 'str' object does not have attribute decode or something. I know that I can't decode the string, but if I try with bytes it seems to be "doubling" the bytes. Here is my code:

def readfile(filename):
    f = open(filename, 'r')
    s = f.read()
    f.close()
    return s

def readfile_b(filename):
    f = open(filename, 'rb')
    s = f.read()
    f.close()
    return s

def writefile(filename, writeobject):
    f = open(filename, 'w')
    f.write(writeobject)
    f.close()

def encode(filename):
    s = readfile(filename)
    s_enc = bytes(s, 'utf-32')
    writefile(filename, str(s_enc))

def decode(filename):
    s_enc = readfile_b(filename)
    print(s_enc)
    s = str(s_enc, 'utf-32')
    writefile(filename, s)

encode("Example.txt")
decode("Example.txt")

Output (for decode(), encode() didn't have any errors):

b"b'\\xff\\xfe\\x00\\x00H\\x00\\x00\\x00e\\x00\\x00\\x00l\\x00\\x00\\x00l\\x00\\x00\\x00o\\x00\\x00\\x00'"
Traceback (most recent call last):
  File "C:/bla/bla/bla/bla/Example.py", line 29, in <module>
    decode("MamaAccount.txt")
  File "C:/bla/bla/bla/bla/Example.py", line 26, in decode
    s = str(s_enc, 'utf-32')
UnicodeDecodeError: 'utf-32-le' codec can't decode bytes in position 0-3: code point not in range(0x110000)

Any help is greatly appreciated

Try using writefile with binary writing. Currently you are writing the bytes casted to a string. When you read that back you'll get ab or 2.

This works for me:

def readfile(filename):
    f = open(filename, 'r')
    s = f.read()
    f.close()
    return s


def readfile_b(filename):
    f = open(filename, 'rb')
    s = f.read()
    f.close()
    return s

def writefile(filename, writeobject):
    f = open(filename, 'w')
    f.write(writeobject)
    f.close()

def writefile_b(filename, writeobject):
    f = open(filename, 'wb')
    f.write(writeobject)
    f.close()

def encode(filename):
    s = readfile(filename)
    s_enc = bytes(s, 'utf-32')
    writefile_b("bin_"+filename, s_enc)

def decode(filename):
    s_enc = readfile_b(filename)
    #print(s_enc)
    s = str(s_enc, 'utf-32')
    print(s)
    writefile("dec_"+filename, s)

encode("Example.txt")
decode("bin_Example.txt")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM