简体   繁体   English

如何打印包含从文件中读取的 utf8 代码的字符串

[英]how to print a string containing utf8 code read from file

I have a file which contains UTF-8 encoded text:我有一个包含 UTF-8 编码文本的文件:

b'\xd8\xa3\xd9\x8a \xd8\xb9\xd9\x84\xd9\x85 \xd9\x87\xd8\xb0\xd8\xa7 \xd8\xa7\xd9\x84\xd8\xb0\xd9\x8a \xd9\x84\xd9\x85 \xd9\x8a\xd8\xb3\xd8\xaa\xd8\xb7\xd8\xb9 \xd8\xad\xd8\xaa\xd9\x89 \xd8\xa7\xd9\x84\xd8\xa2\xd9\x86 \xd8\xa3\xd9\x86 \xd9\x8a\xd8\xb6\xd8\xb9 \xd8\xa3\xd8\xb5\xd9\x88\xd8\xa7\xd8\xaa \xd9\x85\xd9\x86 \xd9\x86\xd8\xad\xd8\xa8 \xd9\x81\xd9\x8a \xd8\xa3\xd9\x82\xd8\xb1\xd8\xa7\xd8\xb5 \xd8\x8c \xd8\xa3\xd9\x88 \xd8\xb2\xd8\xac\xd8\xa7\xd8\xac\xd8\xa9 \xd8\xaf\xd9\x88\xd8\xa7\xd8\xa1 \xd9\x86\xd8\xaa\xd9\x86\xd8\xa7\xd9\x88\xd9\x84\xd9\x87\xd8\xa7 \xd8\xb3\xd8\xb1\xd9\x91\xd9\x8b\xd8\xa7 \xd8\x8c \xd8\xb9\xd9\x86\xd8\xaf\xd9\x85\xd8\xa7 \xd9\x86\xd8\xb5\xd8\xa7\xd8\xa8 \xd8\xa8\xd9\x88\xd8\xb9\xd9\x83\xd8\xa9 \xd8\xb9\xd8\xa7\xd8\xb7\xd9\x81\xd9\x8a\xd8\xa9 \xd8\xa8\xd8\xaf\xd9\x88\xd9\x86 \xd8\xa3\xd9\x86 \xd9\x8a\xd8\xaf\xd8\xb1\xd9\x8a \xd8\xb5\xd8\xa7\xd8\xad\xd8\xa8\xd9\x87\xd8\xa7 \xd9\x83\xd9\x85 \xd9\x86\xd8\xad\xd9\x86 \xd9\x86\xd8\xad\xd8\xaa\xd8\xa7\xd8\xac\xd9\x87 - \xd8\xa3\xd8\xad\xd9\x84\xd8\xa7\xd9\x85 \xd9\x85\xd8\xb3\xd8\xaa\xd8\xba\xd8\xa7\xd9\x86\xd9\x85\xd9\x8a, \xd8\xb9\xd8\xa7\xd8\xa8\xd8\xb1 \xd8\xb3\xd8\xb1\xd9\x8a\xd8\xb1'

I've tried to print it correctly once decoded but I did not succeed when:我尝试在解码后正确打印它,但在以下情况下我没有成功:

  1. reading from file as text option 'r', decode by bytes(text,'utf8').decode('utf8')从文件中读取文本选项'r',按字节解码(文本,'utf8').decode('utf8')

  2. reading from file as binary option 'rb', decode by binary.decode('utf8')从文件中读取二进制选项'rb',通过 binary.decode('utf8') 解码

I tried to convert the content in many ways (split text in list, cut out the b'... ', ...) but didn't succeed to print it clearly!我尝试以多种方式转换内容(在列表中拆分文本,切掉 b'...',...)但没有成功打印清楚!

What am I missing - is the file correctly 'encoded'?我错过了什么 - 文件是否正确“编码”?

Here is my code in Python 3.7.3这是我在 Python 3.7.3 中的代码

with open('/home/pi/Desktop/unicode_a_decoder.txt', 'r') as f:
    text = f.read()
print(type(text),text)
#seq = text.decode
#seq = bytes(text,"utf8")
#print('seq',seq)
#seq = text
seq = text.split(" ")
#print(seq, seq[0],bytes(seq[0]))
print('seq',seq)
s0 = seq[0]
print(s0,type(s0))
s02byte = bytes(s0, 'utf8')
print(s02byte, type(s02byte))
#print(seq.decode("utf8"))

For me, it worked when I simply used .decode()对我来说,当我简单地使用.decode()时它就起作用了

This is what I did:这就是我所做的:

text = b'\xd8\xa3\xd9\x8a \xd8\xb9\xd9\x84\xd9\x85 \xd9\x87\xd8\xb0\xd8\xa7 \xd8\xa7\xd9\x84\xd8\xb0\xd9\x8a \xd9\x84\xd9\x85 \xd9\x8a\xd8\xb3\xd8\xaa\xd8\xb7\xd8\xb9 \xd8\xad\xd8\xaa\xd9\x89 \xd8\xa7\xd9\x84\xd8\xa2\xd9\x86 \xd8\xa3\xd9\x86 \xd9\x8a\xd8\xb6\xd8\xb9 \xd8\xa3\xd8\xb5\xd9\x88\xd8\xa7\xd8\xaa \xd9\x85\xd9\x86 \xd9\x86\xd8\xad\xd8\xa8 \xd9\x81\xd9\x8a \xd8\xa3\xd9\x82\xd8\xb1\xd8\xa7\xd8\xb5 \xd8\x8c \xd8\xa3\xd9\x88 \xd8\xb2\xd8\xac\xd8\xa7\xd8\xac\xd8\xa9 \xd8\xaf\xd9\x88\xd8\xa7\xd8\xa1 \xd9\x86\xd8\xaa\xd9\x86\xd8\xa7\xd9\x88\xd9\x84\xd9\x87\xd8\xa7 \xd8\xb3\xd8\xb1\xd9\x91\xd9\x8b\xd8\xa7 \xd8\x8c \xd8\xb9\xd9\x86\xd8\xaf\xd9\x85\xd8\xa7 \xd9\x86\xd8\xb5\xd8\xa7\xd8\xa8 \xd8\xa8\xd9\x88\xd8\xb9\xd9\x83\xd8\xa9 \xd8\xb9\xd8\xa7\xd8\xb7\xd9\x81\xd9\x8a\xd8\xa9 \xd8\xa8\xd8\xaf\xd9\x88\xd9\x86 \xd8\xa3\xd9\x86 \xd9\x8a\xd8\xaf\xd8\xb1\xd9\x8a \xd8\xb5\xd8\xa7\xd8\xad\xd8\xa8\xd9\x87\xd8\xa7 \xd9\x83\xd9\x85 \xd9\x86\xd8\xad\xd9\x86 \xd9\x86\xd8\xad\xd8\xaa\xd8\xa7\xd8\xac\xd9\x87 - \xd8\xa3\xd8\xad\xd9\x84\xd8\xa7\xd9\x85 \xd9\x85\xd8\xb3\xd8\xaa\xd8\xba\xd8\xa7\xd9\x86\xd9\x85\xd9\x8a, \xd8\xb9\xd8\xa7\xd8\xa8\xd8\xb1 \xd8\xb3\xd8\xb1\xd9\x8a\xd8\xb1'
print(text.decode())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM