简体   繁体   中英

Converting binary stored Unicode Chinese Characters back to Unicode using Python 3

I'm working from an OpenOffice produced .csv with mixed roman and Chinese characters. This is an example of one row:

b'\xe5\xbc\x80\xe5\xbf\x83'b'K\xc4\x81i x\xc4\xabn'b'Open heart 'b'Happy '

This section contains two Chinese characters stored in binary which I would like displayed as Chinese characters on the command line from a very basic Python 3 program (see bottom), how do I do this?

b'\xe5\xbc\x80\xe5\xbf\x83'b'K\xc4\x81i x\xc4\xabn'

When I open the .csv in OpenOffice I need to select "Chinese Simplified UEC-CN" as the Character set if that helps. I have searched extensively but I do not understand Unicode and the pages do not make sense.

import csv
f = open('Chinese.csv', encoding="utf-8") 
file = csv.reader(f)

for line in file:
    for word in line:
        print(word.encode('utf-8'), end='')
    print("\n")

Thank you in advance for any suggestions.

Thanks to a suggestion by @eryksun I solved my issue by re-encoding the source file to UTF-8 from ASCII. The question is different but the solution is here :

http://www.stackoverflow.com/a/542899/792015

Alternatively if you are using Eclipse you can paste a non roman character (such as a Chinese character like ) into your source code and save the file. If the source is not already UTF-8 Eclipse will offer to change it for you.

Thank you for all your suggestions and my apologies for answering my own question.

Footnote : If anyone knows why changing the source file type effects the compiled program I would love to know. According to https://docs.python.org/3/tutorial/interpreter.html the interpreter treats source files as UTF-8 by default.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM