[英]Recursively transform dict leaves in Python
I'm having trouble applying a function to all leaves of a dict (loaded from a JSON file) in Python.我在 Python 中将函数应用于 dict(从 JSON 文件加载)的所有叶子时遇到问题。 The text has been badly encoded and I want to use the ftfy module to fix it.文本编码错误,我想使用 ftfy 模块来修复它。
Here is my function:这是我的功能:
def recursive_decode_dict(e):
try:
if type(e) is dict:
print('Dict: %s' % e)
return {k: recursive_decode_dict(v) for k, v in e.items()}
elif type(e) is list:
print('List: %s' % e)
return list(map(recursive_decode_dict, e))
elif type(e) is str:
print('Str: %s' % e)
print('Transformed str: %s' % e.encode('sloppy-windows-1252').decode('utf-8'))
return e.encode('sloppy-windows-1252').decode('utf-8')
else:
return e
Which I call this way :我这样称呼它:
with open('test.json', 'r', encoding='utf-8') as f1:
json_content = json.load(f1)
recursive_decode_dict(json_content)
with open('out.json', 'w', encoding='utf-8') as f2:
json.dump(json_content, f2, indent=2)
Console output is fine :控制台输出很好:
> python fix_encoding.py
List: [{'fields': {'field1': 'the European-style café into a '}}]
Dict: {'fields': {'field1': 'the European-style café into a '}}
Dict: {'field1': 'the European-style café into a '}
Str: the European-style café into a
Transformed str: the European-style café into a
But my output file is not fixed :但我的输出文件不固定:
[
{
"fields": {
"field1": "the European-style caf\u00c3\u00a9 into a "
}
}
]
If it's JSON data you're massaging, you can instead hook into the JSON decoder and fix strings as you encounter them.如果它是您正在按摩的 JSON 数据,您可以改为连接到 JSON 解码器并在遇到字符串时修复它们。
This does require using the slower Python-based JSON parser though, but that's likely not an issue for an one-off conversion...不过,这确实需要使用较慢的基于 Python 的 JSON 解析器,但这对于一次性转换来说可能不是问题......
import json
import ftfy
decoder = json.JSONDecoder()
def ftfy_parse_string(*args, **kwargs):
string, length = json.decoder.scanstring(*args, **kwargs)
string = string.encode("sloppy-windows-1252").decode("utf-8")
return (string, length)
decoder.parse_string = ftfy_parse_string
decoder.scan_once = json.scanner.py_make_scanner(decoder)
print(decoder.decode(r"""[
{
"fields": {
"field1": "the European-style café into a "
}
}
]"""))
outputs产出
[{'fields': {'field1': 'the European-style café into a '}}]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.