[英]Encoding and decoding string in Python
I want to write a string to a file using Python. 我想使用Python将字符串写入文件。 I know how to do that, so that's not a problem.
我知道该怎么做,所以这不是问题。 I also wish to encode that string once it has been written.
我也希望在写入字符串后对其进行编码。 The encoding doesn't really matter, so I'll stick to let's say UTF-32.
编码并不重要,所以我坚持使用UTF-32。 What I do for that is after I wrote the string, I read from the file again, encode the string into bytes and then re-write to the same file.
为此,我要做的是在写入字符串之后,再次从文件中读取字符串,将字符串编码为字节,然后重新写入同一文件。 I can do the encoding part, but my problem arises with the decoding.
我可以做编码部分,但是我的问题出在解码上。 I want to read it as bytes so that I can convert it back to a
str
. 我想将其读取为字节,以便可以将其转换回
str
。 What I do for this is the same principle: Read from file, decode and write to the same file. 我这样做的原理是相同的:从文件读取,解码和写入同一文件。 What I get from reading the encoded string looks like
b'\\xff\\xfe\\x00\\x001\\x00\\x00\\x004\\x00\\x00\\x002\\x00\\x00\\x00'
我从读取编码的字符串中得到的结果看起来像
b'\\xff\\xfe\\x00\\x001\\x00\\x00\\x004\\x00\\x00\\x002\\x00\\x00\\x00'
When I read this as bytes, it doubles the b
and the backslashes. 当我将其读取为字节时,它将
b
和反斜杠加倍。 If I read it like this, as a string, and then try to decode, it keeps saying 'str' object does not have attribute decode
or something. 如果我这样读它,作为一个字符串,然后尝试解码,它总是说
'str' object does not have attribute decode
或其他内容。 I know that I can't decode the string, but if I try with bytes it seems to be "doubling" the bytes. 我知道我无法解码字符串,但是如果我尝试使用字节,那似乎是在“加倍”字节。 Here is my code:
这是我的代码:
def readfile(filename):
f = open(filename, 'r')
s = f.read()
f.close()
return s
def readfile_b(filename):
f = open(filename, 'rb')
s = f.read()
f.close()
return s
def writefile(filename, writeobject):
f = open(filename, 'w')
f.write(writeobject)
f.close()
def encode(filename):
s = readfile(filename)
s_enc = bytes(s, 'utf-32')
writefile(filename, str(s_enc))
def decode(filename):
s_enc = readfile_b(filename)
print(s_enc)
s = str(s_enc, 'utf-32')
writefile(filename, s)
encode("Example.txt")
decode("Example.txt")
Output (for decode(), encode() didn't have any errors): 输出(对于decode(),encode()没有任何错误):
b"b'\\xff\\xfe\\x00\\x00H\\x00\\x00\\x00e\\x00\\x00\\x00l\\x00\\x00\\x00l\\x00\\x00\\x00o\\x00\\x00\\x00'"
Traceback (most recent call last):
File "C:/bla/bla/bla/bla/Example.py", line 29, in <module>
decode("MamaAccount.txt")
File "C:/bla/bla/bla/bla/Example.py", line 26, in decode
s = str(s_enc, 'utf-32')
UnicodeDecodeError: 'utf-32-le' codec can't decode bytes in position 0-3: code point not in range(0x110000)
Any help is greatly appreciated 任何帮助是极大的赞赏
Try using writefile with binary writing. 尝试将writefile与二进制写入一起使用。 Currently you are writing the bytes casted to a string.
当前,您正在写入转换为字符串的字节。 When you read that back you'll get ab or 2.
当您读到那句话时,您会得到ab或2。
This works for me: 这对我有用:
def readfile(filename):
f = open(filename, 'r')
s = f.read()
f.close()
return s
def readfile_b(filename):
f = open(filename, 'rb')
s = f.read()
f.close()
return s
def writefile(filename, writeobject):
f = open(filename, 'w')
f.write(writeobject)
f.close()
def writefile_b(filename, writeobject):
f = open(filename, 'wb')
f.write(writeobject)
f.close()
def encode(filename):
s = readfile(filename)
s_enc = bytes(s, 'utf-32')
writefile_b("bin_"+filename, s_enc)
def decode(filename):
s_enc = readfile_b(filename)
#print(s_enc)
s = str(s_enc, 'utf-32')
print(s)
writefile("dec_"+filename, s)
encode("Example.txt")
decode("bin_Example.txt")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.