简体   繁体   English

在 Python 中递归转换 dict 叶子

[英]Recursively transform dict leaves in Python

I'm having trouble applying a function to all leaves of a dict (loaded from a JSON file) in Python.我在 Python 中将函数应用于 dict(从 JSON 文件加载)的所有叶子时遇到问题。 The text has been badly encoded and I want to use the ftfy module to fix it.文本编码错误,我想使用 ftfy 模块来修复它。

Here is my function:这是我的功能:

def recursive_decode_dict(e):
    try:
        if type(e) is dict:
            print('Dict: %s' % e)
            return {k: recursive_decode_dict(v) for k, v in e.items()}

        elif type(e) is list:
            print('List: %s' % e)
            return list(map(recursive_decode_dict, e))

        elif type(e) is str:
            print('Str: %s' % e)
            print('Transformed str: %s' % e.encode('sloppy-windows-1252').decode('utf-8'))
            return e.encode('sloppy-windows-1252').decode('utf-8')

        else:
            return e

Which I call this way :我这样称呼它:

with open('test.json', 'r', encoding='utf-8') as f1:
    json_content = json.load(f1)
    recursive_decode_dict(json_content)


with open('out.json', 'w', encoding='utf-8') as f2:
    json.dump(json_content, f2, indent=2)

Console output is fine :控制台输出很好:

  > python fix_encoding.py 
List: [{'fields': {'field1': 'the European-style café into a '}}]
Dict: {'fields': {'field1': 'the European-style café into a '}}
Dict: {'field1': 'the European-style café into a '}
Str: the European-style café into a 
Transformed str: the European-style café into a 

But my output file is not fixed :但我的输出文件不固定:

[
  {
    "fields": {
      "field1": "the European-style caf\u00c3\u00a9 into a "
    }
  }
]

If it's JSON data you're massaging, you can instead hook into the JSON decoder and fix strings as you encounter them.如果它是您正在按摩的 JSON 数据,您可以改为连接到 JSON 解码器并在遇到字符串时修复它们。

This does require using the slower Python-based JSON parser though, but that's likely not an issue for an one-off conversion...不过,这确实需要使用较慢的基于 Python 的 JSON 解析器,但这对于一次性转换来说可能不是问题......

import json
import ftfy


decoder = json.JSONDecoder()


def ftfy_parse_string(*args, **kwargs):
    string, length = json.decoder.scanstring(*args, **kwargs)
    string = string.encode("sloppy-windows-1252").decode("utf-8")
    return (string, length)


decoder.parse_string = ftfy_parse_string
decoder.scan_once = json.scanner.py_make_scanner(decoder)

print(decoder.decode(r"""[
  {
    "fields": {
      "field1": "the European-style café into a "
    }
  }
]"""))

outputs产出

[{'fields': {'field1': 'the European-style café into a '}}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM