[英]Read and write unicode from file in Python
I have a unicode string ® in a file. 我的文件中有一个unicode字符串®。 I want to read it with Python, convert it to bits, then back to unicode and write into a new file.
我想用Python读取它,将其转换为位,然后返回unicode并写入一个新文件。 It works if I make a variable
test_unicode = "®"
and work with it. 如果我使变量
test_unicode = "®"
并使用它,它将起作用。 But if I read this ® from a file - I get some random stuff (tried errors=replace, backslashreplace, ignore). 但是,如果我从文件中读取此®,则会得到一些随机的信息(尝试错误=替换,反斜杠替换,忽略)。 Here is my script:
这是我的脚本:
def frombits(bits):
chars = []
for b in range(int(len(bits) / 8)):
byte = bits[b*8:(b+1)*8]
chars.append(chr(int(''.join([str(bit) for bit in byte]), 2)))
return ''.join(chars)
bit_list = ''
with open('uni.txt', "r", encoding='utf-8', errors='replace') as f:
byte = f.read(1)
while(byte):
bit_list+='{0:08b}'.format(ord(byte))
byte=f.read(1)
test_unicode = '®'
test_unicode_bit_list = '{0:08b}'.format(ord(test_unicode))
print(bit_list)
print(test_unicode_bit_list)
test_unicode = ''.join(frombits(test_unicode_bit_list))
read_unicode = ''.join(frombits(bit_list))
print(test_unicode.encode("utf-8"))
print(read_unicode.encode("utf-8"))
f = open("uni_test.txt", 'wb')
f.write(test_unicode.encode("utf-8"))
f = open("uni_read.txt", 'wb')
f.write(read_unicode.encode("utf-8"))
If I make a file uni.txt with ® inside and run this script the end I get 2 files (first was made using the variable test_unicode and second one using the value read from uni.txt): 如果我在内部使用®创建文件uni.txt并运行此脚本,最后我将获得2个文件(第一个文件是使用变量test_unicode创建的,第二个文件是使用从uni.txt读取的值创建的):
uni_test.txt ---> ® uni_test.txt --->®
uni_read.txt ---> ÿý uni_read.txt --->ÿý
How do I do this "read - convert to bits - convert to unicode - write" procedure correctly? 如何正确执行“读取-转换为位-转换为unicode-写入”过程? Thank you!
谢谢!
使用open(filename, 'rb')
打开文件以读取字节,然后使用适当的编码保存
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.