[英]how to print a string containing utf8 code read from file
I have a file which contains UTF-8 encoded text:我有一个包含 UTF-8 编码文本的文件:
b'\xd8\xa3\xd9\x8a \xd8\xb9\xd9\x84\xd9\x85 \xd9\x87\xd8\xb0\xd8\xa7 \xd8\xa7\xd9\x84\xd8\xb0\xd9\x8a \xd9\x84\xd9\x85 \xd9\x8a\xd8\xb3\xd8\xaa\xd8\xb7\xd8\xb9 \xd8\xad\xd8\xaa\xd9\x89 \xd8\xa7\xd9\x84\xd8\xa2\xd9\x86 \xd8\xa3\xd9\x86 \xd9\x8a\xd8\xb6\xd8\xb9 \xd8\xa3\xd8\xb5\xd9\x88\xd8\xa7\xd8\xaa \xd9\x85\xd9\x86 \xd9\x86\xd8\xad\xd8\xa8 \xd9\x81\xd9\x8a \xd8\xa3\xd9\x82\xd8\xb1\xd8\xa7\xd8\xb5 \xd8\x8c \xd8\xa3\xd9\x88 \xd8\xb2\xd8\xac\xd8\xa7\xd8\xac\xd8\xa9 \xd8\xaf\xd9\x88\xd8\xa7\xd8\xa1 \xd9\x86\xd8\xaa\xd9\x86\xd8\xa7\xd9\x88\xd9\x84\xd9\x87\xd8\xa7 \xd8\xb3\xd8\xb1\xd9\x91\xd9\x8b\xd8\xa7 \xd8\x8c \xd8\xb9\xd9\x86\xd8\xaf\xd9\x85\xd8\xa7 \xd9\x86\xd8\xb5\xd8\xa7\xd8\xa8 \xd8\xa8\xd9\x88\xd8\xb9\xd9\x83\xd8\xa9 \xd8\xb9\xd8\xa7\xd8\xb7\xd9\x81\xd9\x8a\xd8\xa9 \xd8\xa8\xd8\xaf\xd9\x88\xd9\x86 \xd8\xa3\xd9\x86 \xd9\x8a\xd8\xaf\xd8\xb1\xd9\x8a \xd8\xb5\xd8\xa7\xd8\xad\xd8\xa8\xd9\x87\xd8\xa7 \xd9\x83\xd9\x85 \xd9\x86\xd8\xad\xd9\x86 \xd9\x86\xd8\xad\xd8\xaa\xd8\xa7\xd8\xac\xd9\x87 - \xd8\xa3\xd8\xad\xd9\x84\xd8\xa7\xd9\x85 \xd9\x85\xd8\xb3\xd8\xaa\xd8\xba\xd8\xa7\xd9\x86\xd9\x85\xd9\x8a, \xd8\xb9\xd8\xa7\xd8\xa8\xd8\xb1 \xd8\xb3\xd8\xb1\xd9\x8a\xd8\xb1'
I've tried to print it correctly once decoded but I did not succeed when:我尝试在解码后正确打印它,但在以下情况下我没有成功:
reading from file as text option 'r', decode by bytes(text,'utf8').decode('utf8')从文件中读取文本选项'r',按字节解码(文本,'utf8').decode('utf8')
reading from file as binary option 'rb', decode by binary.decode('utf8')从文件中读取二进制选项'rb',通过 binary.decode('utf8') 解码
I tried to convert the content in many ways (split text in list, cut out the b'... ', ...) but didn't succeed to print it clearly!我尝试以多种方式转换内容(在列表中拆分文本,切掉 b'...',...)但没有成功打印清楚!
What am I missing - is the file correctly 'encoded'?我错过了什么 - 文件是否正确“编码”?
Here is my code in Python 3.7.3这是我在 Python 3.7.3 中的代码
with open('/home/pi/Desktop/unicode_a_decoder.txt', 'r') as f:
text = f.read()
print(type(text),text)
#seq = text.decode
#seq = bytes(text,"utf8")
#print('seq',seq)
#seq = text
seq = text.split(" ")
#print(seq, seq[0],bytes(seq[0]))
print('seq',seq)
s0 = seq[0]
print(s0,type(s0))
s02byte = bytes(s0, 'utf8')
print(s02byte, type(s02byte))
#print(seq.decode("utf8"))
For me, it worked when I simply used .decode()
对我来说,当我简单地使用
.decode()
时它就起作用了
This is what I did:这就是我所做的:
text = b'\xd8\xa3\xd9\x8a \xd8\xb9\xd9\x84\xd9\x85 \xd9\x87\xd8\xb0\xd8\xa7 \xd8\xa7\xd9\x84\xd8\xb0\xd9\x8a \xd9\x84\xd9\x85 \xd9\x8a\xd8\xb3\xd8\xaa\xd8\xb7\xd8\xb9 \xd8\xad\xd8\xaa\xd9\x89 \xd8\xa7\xd9\x84\xd8\xa2\xd9\x86 \xd8\xa3\xd9\x86 \xd9\x8a\xd8\xb6\xd8\xb9 \xd8\xa3\xd8\xb5\xd9\x88\xd8\xa7\xd8\xaa \xd9\x85\xd9\x86 \xd9\x86\xd8\xad\xd8\xa8 \xd9\x81\xd9\x8a \xd8\xa3\xd9\x82\xd8\xb1\xd8\xa7\xd8\xb5 \xd8\x8c \xd8\xa3\xd9\x88 \xd8\xb2\xd8\xac\xd8\xa7\xd8\xac\xd8\xa9 \xd8\xaf\xd9\x88\xd8\xa7\xd8\xa1 \xd9\x86\xd8\xaa\xd9\x86\xd8\xa7\xd9\x88\xd9\x84\xd9\x87\xd8\xa7 \xd8\xb3\xd8\xb1\xd9\x91\xd9\x8b\xd8\xa7 \xd8\x8c \xd8\xb9\xd9\x86\xd8\xaf\xd9\x85\xd8\xa7 \xd9\x86\xd8\xb5\xd8\xa7\xd8\xa8 \xd8\xa8\xd9\x88\xd8\xb9\xd9\x83\xd8\xa9 \xd8\xb9\xd8\xa7\xd8\xb7\xd9\x81\xd9\x8a\xd8\xa9 \xd8\xa8\xd8\xaf\xd9\x88\xd9\x86 \xd8\xa3\xd9\x86 \xd9\x8a\xd8\xaf\xd8\xb1\xd9\x8a \xd8\xb5\xd8\xa7\xd8\xad\xd8\xa8\xd9\x87\xd8\xa7 \xd9\x83\xd9\x85 \xd9\x86\xd8\xad\xd9\x86 \xd9\x86\xd8\xad\xd8\xaa\xd8\xa7\xd8\xac\xd9\x87 - \xd8\xa3\xd8\xad\xd9\x84\xd8\xa7\xd9\x85 \xd9\x85\xd8\xb3\xd8\xaa\xd8\xba\xd8\xa7\xd9\x86\xd9\x85\xd9\x8a, \xd8\xb9\xd8\xa7\xd8\xa8\xd8\xb1 \xd8\xb3\xd8\xb1\xd9\x8a\xd8\xb1'
print(text.decode())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.