简体   繁体   中英

How to use json.tool from the shell to validate and pretty-print language files without removing the unicode?

Ubuntu 16.04
Bash 4.4
python 3.5

I received a bunch of language files from the translators at Upwork and noticed none of the files had the same line count. So I decided to validate and pretty-print them since they were in .json format and then see which lines were missing from each file, so I made a simple script to validate and pretty-print:

#!/bin/sh

for file in *.json; do
   python -m json.tool "${file}" > "${file}".tmp;
   rm -f "${file}";
   mv "${file}".tmp "${file}"
done

Now my Russian Langauge file looks like:

"manualdirections": "\u041c\u0430\u0440\u0448\u0440\u0443\u0442",
"moreinformation": "\u0414\u0435\u0442\u0430\u043b\u0438",
"no": "\u041d\u0435\u0442",

I would very much like to keep the content of the files untouched.

This is not possible in json.tool :

https://github.com/python/cpython/blob/3.5/Lib/json/tool.py#L45

The call to json.dumps does not allow to pass the keyword argument ensure_ascii=False which would solve your issue here.

You will have to write your own json.tool , monkeypatch it, or use third-party code.

edit: I've proposed PR 9765 to add this feature to json.tool in Python 3.8.

#!/usr/bin/python3

for filename in os.listdir('/path/to/json_files'):
    if filename.endswith('.json'):
        with open(filename, encoding='utf-8') as f:
            data = f.read()
            print(json.dumps(data, indent=4))

Notice the encoding used with open() . This SHOULD import the files and display them as necessary. I think.

You can use the following equivalent Python script instead, which uses a subclass of json.JSONEncoder to override the ensure_ascii option:

import json
import os
import glob

class allow_nonascii(json.JSONEncoder):
    def __init__(self, *args, ensure_ascii=False, **kwargs):
        super().__init__(*args, ensure_ascii=False, **kwargs)

for file in glob.iglob('*.json'):
    with open(file, 'r') as fin, open(file + '.tmp', 'w') as fout:
        fout.write(json.dumps(json.load(fin), cls=allow_nonascii, indent=4))
        os.remove(file)
        os.rename(file + '.tmp', file)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM