简体   繁体   English

使用Python 3将二进制存储的Unicode汉字转换回Unicode

[英]Converting binary stored Unicode Chinese Characters back to Unicode using Python 3

I'm working from an OpenOffice produced .csv with mixed roman and Chinese characters. 我正在使用OpenOffice生产的.csv文件,其中包含罗马字符和中文字符。 This is an example of one row: 这是一行的示例:

b'\xe5\xbc\x80\xe5\xbf\x83'b'K\xc4\x81i x\xc4\xabn'b'Open heart 'b'Happy '

This section contains two Chinese characters stored in binary which I would like displayed as Chinese characters on the command line from a very basic Python 3 program (see bottom), how do I do this? 本节包含两个二进制存储的汉字,我想从一个非常基本的Python 3程序(请参阅底部)在命令行中将其显示为汉字,我该怎么做?

b'\xe5\xbc\x80\xe5\xbf\x83'b'K\xc4\x81i x\xc4\xabn'

When I open the .csv in OpenOffice I need to select "Chinese Simplified UEC-CN" as the Character set if that helps. 在OpenOffice中打开.csv时,如果有帮助,我需要选择“简体中文UEC-CN”作为字符集。 I have searched extensively but I do not understand Unicode and the pages do not make sense. 我已经进行了广泛的搜索,但是我不了解Unicode,因此页面没有意义。

import csv
f = open('Chinese.csv', encoding="utf-8") 
file = csv.reader(f)

for line in file:
    for word in line:
        print(word.encode('utf-8'), end='')
    print("\n")

Thank you in advance for any suggestions. 预先感谢您的任何建议。

Thanks to a suggestion by @eryksun I solved my issue by re-encoding the source file to UTF-8 from ASCII. 感谢@eryksun的建议,我通过将源文件从ASCII重新编码为UTF-8来解决了我的问题。 The question is different but the solution is here : 问题不同,但解决方案在这里:

http://www.stackoverflow.com/a/542899/792015 http://www.stackoverflow.com/a/542899/792015

Alternatively if you are using Eclipse you can paste a non roman character (such as a Chinese character like ) into your source code and save the file. 另外,如果您使用的是Eclipse,你可以非罗马字符(如中国字一样 )粘贴到你的源代码并保存文件。 If the source is not already UTF-8 Eclipse will offer to change it for you. 如果源代码不是UTF-8,Eclipse将为您提供更改。

Thank you for all your suggestions and my apologies for answering my own question. 感谢您提出的所有建议,并为您回答我的问题深表歉意。

Footnote : If anyone knows why changing the source file type effects the compiled program I would love to know. 脚注:如果有人知道为什么更改源文件类型会影响我想知道的编译程序。 According to https://docs.python.org/3/tutorial/interpreter.html the interpreter treats source files as UTF-8 by default. 根据https://docs.python.org/3/tutorial/interpreter.html的解释,默认情况下解释器会将源文件视为UTF-8。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM