简体   繁体   中英

Make utf8 readable in a file

I have dictionary of dictionary which has utf8 encoded keys. I am dumping this dictionary to a file using json module.
In the file keys are printed in utf8 format. Keys are actually letters of Bengali language.

I want actual letters to get written in the file. How to do this ??

If I print those keys(one of them is u'\ং') to console actual letter(ং) is shown but in my file \ং .is written. What does print do to show the actual letter?

You are writing JSON; the JSON standard allows for \\uxxxx escape sequences to encode non-ASCII characters. The Python json module uses this by default.

Switch off the feature by using the ensure_ascii=False switch when dumping the data:

json.dump(obj, yourfileobject, ensure_ascii=False)

This does mean that the output is no longer encoded to UTF-8 bytes as well; you'll need to use a codecs.open() managed file for this:

import json
import codecs

with codecs.open('/path/to/file', 'w', encoding='utf8') as output:
    json.dump(obj, output, ensure_ascii=False)

Now your unicode characters will be written to the file as UTF-8 encoded bytes instead. When opening the file with another program that decodes UTF-8 again, your codepoints should be displayed again as the same characters.

use ensure_ascii parameter.

>>> import json
>>> print json.dumps(u'\u0982')
"\u0982"
>>> print json.dumps(u'\u0982', ensure_ascii=False)
"ং"

http://docs.python.org/2/library/json.html#json.dump

If ensure_ascii is True (the default), all non-ASCII characters in the output are escaped with \\uXXXX sequences, and the result is a str instance consisting of ASCII characters only. If ensure_ascii is False, some chunks written to fp may be unicode instances. ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM