使用Python 3将二进制存储的Unicode汉字转换回Unicode

Question

I'm working from an OpenOffice produced .csv with mixed roman and Chinese characters. 我正在使用OpenOffice生产的.csv文件，其中包含罗马字符和中文字符。 This is an example of one row: 这是一行的示例：

b'\xe5\xbc\x80\xe5\xbf\x83'b'K\xc4\x81i x\xc4\xabn'b'Open heart 'b'Happy '

This section contains two Chinese characters stored in binary which I would like displayed as Chinese characters on the command line from a very basic Python 3 program (see bottom), how do I do this? 本节包含两个二进制存储的汉字，我想从一个非常基本的Python 3程序（请参阅底部）在命令行中将其显示为汉字，我该怎么做？

b'\xe5\xbc\x80\xe5\xbf\x83'b'K\xc4\x81i x\xc4\xabn'

When I open the .csv in OpenOffice I need to select "Chinese Simplified UEC-CN" as the Character set if that helps. 在OpenOffice中打开.csv时，如果有帮助，我需要选择“简体中文UEC-CN”作为字符集。 I have searched extensively but I do not understand Unicode and the pages do not make sense. 我已经进行了广泛的搜索，但是我不了解Unicode，因此页面没有意义。

import csv
f = open('Chinese.csv', encoding="utf-8") 
file = csv.reader(f)

for line in file:
    for word in line:
        print(word.encode('utf-8'), end='')
    print("\n")

Thank you in advance for any suggestions. 预先感谢您的任何建议。

Answer 1

Thanks to a suggestion by @eryksun I solved my issue by re-encoding the source file to UTF-8 from ASCII. 感谢@eryksun的建议，我通过将源文件从ASCII重新编码为UTF-8来解决了我的问题。 The question is different but the solution is here : 问题不同，但解决方案在这里：

http://www.stackoverflow.com/a/542899/792015 http://www.stackoverflow.com/a/542899/792015

Alternatively if you are using Eclipse you can paste a non roman character (such as a Chinese character like 大 ) into your source code and save the file. 另外，如果您使用的是Eclipse，你可以非罗马字符（如中国字一样大）粘贴到你的源代码并保存文件。 If the source is not already UTF-8 Eclipse will offer to change it for you. 如果源代码不是UTF-8，Eclipse将为您提供更改。

Thank you for all your suggestions and my apologies for answering my own question. 感谢您提出的所有建议，并为您回答我的问题深表歉意。

Footnote : If anyone knows why changing the source file type effects the compiled program I would love to know. 脚注：如果有人知道为什么更改源文件类型会影响我想知道的编译程序。 According to https://docs.python.org/3/tutorial/interpreter.html the interpreter treats source files as UTF-8 by default. 根据https://docs.python.org/3/tutorial/interpreter.html的解释，默认情况下解释器会将源文件视为UTF-8。

使用Python 3将二进制存储的Unicode汉字转换回Unicode

问题描述

1 个解决方案

解决方案1
0 2014-05-15 04:51:53

使用Python 3将二进制存储的Unicode汉字转换回Unicode

问题描述

1 个解决方案

解决方案1 0 2014-05-15 04:51:53

解决方案1
0 2014-05-15 04:51:53