I'm working from an OpenOffice produced .csv with mixed roman and Chinese characters. This is an example of one row:
b'\xe5\xbc\x80\xe5\xbf\x83'b'K\xc4\x81i x\xc4\xabn'b'Open heart 'b'Happy '
This section contains two Chinese characters stored in binary which I would like displayed as Chinese characters on the command line from a very basic Python 3 program (see bottom), how do I do this?
b'\xe5\xbc\x80\xe5\xbf\x83'b'K\xc4\x81i x\xc4\xabn'
When I open the .csv in OpenOffice I need to select "Chinese Simplified UEC-CN" as the Character set if that helps. I have searched extensively but I do not understand Unicode and the pages do not make sense.
import csv
f = open('Chinese.csv', encoding="utf-8")
file = csv.reader(f)
for line in file:
for word in line:
print(word.encode('utf-8'), end='')
print("\n")
Thank you in advance for any suggestions.
Thanks to a suggestion by @eryksun I solved my issue by re-encoding the source file to UTF-8 from ASCII. The question is different but the solution is here :
http://www.stackoverflow.com/a/542899/792015
Alternatively if you are using Eclipse you can paste a non roman character (such as a Chinese character like 大 ) into your source code and save the file. If the source is not already UTF-8 Eclipse will offer to change it for you.
Thank you for all your suggestions and my apologies for answering my own question.
Footnote : If anyone knows why changing the source file type effects the compiled program I would love to know. According to https://docs.python.org/3/tutorial/interpreter.html the interpreter treats source files as UTF-8 by default.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.