简体   繁体   中英

Read and write unicode from file in Python

I have a unicode string ® in a file. I want to read it with Python, convert it to bits, then back to unicode and write into a new file. It works if I make a variable test_unicode = "®" and work with it. But if I read this ® from a file - I get some random stuff (tried errors=replace, backslashreplace, ignore). Here is my script:

def frombits(bits):
    chars = []
    for b in range(int(len(bits) / 8)):
        byte = bits[b*8:(b+1)*8]
        chars.append(chr(int(''.join([str(bit) for bit in byte]), 2)))
    return ''.join(chars)

bit_list = ''
with open('uni.txt', "r", encoding='utf-8', errors='replace') as f:
    byte = f.read(1)
    while(byte):
        bit_list+='{0:08b}'.format(ord(byte))
        byte=f.read(1)

test_unicode = '®'
test_unicode_bit_list = '{0:08b}'.format(ord(test_unicode))

print(bit_list)
print(test_unicode_bit_list)

test_unicode = ''.join(frombits(test_unicode_bit_list))
read_unicode = ''.join(frombits(bit_list))

print(test_unicode.encode("utf-8"))
print(read_unicode.encode("utf-8"))

f = open("uni_test.txt", 'wb')
f.write(test_unicode.encode("utf-8"))
f = open("uni_read.txt", 'wb')
f.write(read_unicode.encode("utf-8"))

If I make a file uni.txt with ® inside and run this script the end I get 2 files (first was made using the variable test_unicode and second one using the value read from uni.txt):

uni_test.txt ---> ®

uni_read.txt ---> ÿý

How do I do this "read - convert to bits - convert to unicode - write" procedure correctly? Thank you!

使用open(filename, 'rb')打开文件以读取字节,然后使用适当的编码保存

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM