简体   繁体   English

Python 3通过值深度合并嵌套的dict

[英]Python 3 Deep merge nested dict by value

I have 2 types of data structures 我有2种类型的数据结构

data = {'name':class_1_name, 'type':'directory', 'children': [{'name':class_2_name, 'type':'directory', 'children': [{'name':class_3_name, 'type':'directory', 'children': []}]}]}

data = {'name':class_1_name, 'type':'directory', 'children': [{'name':class_2_name, 'type':'directory', 'children': []}]}

Now my problem arises when merging multiple versions of these dicts in a loop. 现在,将这些字典的多个版本合并成一个循环时,就会出现我的问题。 Because the children are always different all my attempts return with only one level of the dict merged. 因为孩子们总是不同的,所以我所有的尝试都只在合并了一层字典的情况下返回。 For Example: 例如:

{
"name": "class_1_1",
"type": "directory",
"children": [
    {
        "name": "class_2_1",
        "type": "directory",
        "children": []
    },
    {
        "name": "class_2_2",
        "type": "directory",
        "children": [
            {
                "name": "class_3_1",
                "type": "directory",
                "children": []
            }
        ]
    },
    {
        "name": "class_2_2",
        "type": "directory",
        "children": [
            {
                "name": "class_3_2",
                "type": "directory",
                "children": []
            }
        ]
    }
]
}

where the result should be: 结果应该是:

    {
"name": "class_1_1",
"type": "directory",
"children": [
    {
        "name": "class_2_1",
        "type": "directory",
        "children": []
    },
    {
        "name": "class_2_2",
        "type": "directory",
        "children": [
            {
                "name": "class_3_1",
                "type": "directory",
                "children": []
            },
            {
                "name": "class_3_2",
                "type": "directory",
                "children": []
            }
        ]
    }
]
}

I'm currently using jsonmerge by avian2 from https://github.com/avian2/jsonmerge because I really don't know where to start to deep merge two dicts by value. 我目前正在使用来自https://github.com/avian2/jsonmerge的 avian2的jsonmerge,因为我真的不知道从哪里开始按值深度合并两个字典。

Every time I try to work this out I run into logical errors. 每当我尝试解决这个问题时,我都会遇到逻辑错误。 I really don't know how to approach this. 我真的不知道该如何处理。 Any help/tips to point me in the right direction would be greatly appreciated. 向我指出正确方向的任何帮助/提示都将不胜感激。

Cheers. 干杯。

Edit code: 编辑代码:

import os
import io
import json
import bs4 as bs
from jsonmerge import Merger

list = [ '' ]
g_dict = {}

def getJsonInfo( eggs ):
    if (eggs == 3):
        data = {'name':class_1_name, 'type':'directory', 'children': [{'name':class_2_name, 'type':'directory', 'children': [{'name':class_3_name, 'type':'directory', 'children': []}]}]}
    else:
        data = {'name':class_1_name, 'type':'directory', 'children': [{'name':class_2_name, 'type':'directory', 'children': []}]}

    schema = {
        "properties": {
            "children": {
                "type": "array",
                "mergeStrategy": "append"
            }
        }
    }

    global g_dict
    merger = Merger(schema)
    g_dict = merger.merge(data, g_dict)

with open('catalogue.html') as html_file:
    tree = bs.BeautifulSoup( html_file,'lxml' )

for class_1 in tree.find_all('div',class_="class_1"):
    class_1_name = class_1['name']
    for class_2 in class_1.find_all('div',class_="class_2"):
        class_2_name = class_2['name']
        class_3 = class_2.find_all('div',class_="class_3")
        if len(class_3) != 0:
            for class_3 in class_2.find_all('div',class_="class_3"):
                class_3_name = class_3['name']
                print(class_1['name'] + ' -> ' + class_2['name'] + ' -> ' + class_3['name'])
                getJsonInfo(3)
        else:
            print(class_1['name'] + ' -> ' + class_2['name'] )
            getJsonInfo(2)

print('Creating JSON Tree')

with io.open('database.json', 'w', encoding='utf-8') as file:
    file.write(json.dumps(g_dict, ensure_ascii=False, indent=4))

print('Done!')

catalogue.html: catalogue.html:

   <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html lang="ja">
<body>
    <body>
        <div class="class_1" name="A">
            <div class="class_2" name="A2">
                <div class="class_3" name="a31"></div>
                <div class="class_3" name="a32"></div>
            </div>
        </div>
        <div class="class_1" name="B">
            <div class="class_2" name="b1"></div>
        </div>
    </body>
</html>

You can use a dict seen to keep track of the first child dict of every distinct name and keep extending its children with other child dict of the same name, and recursively traverse down the children of children: 您可以使用seen的dict来跟踪每个不同名称的第一个子dict,并继续将其children与同名的其他子dict一起扩展,并递归遍历子项的子项:

def deep_merge(d):
    seen = {}
    for c in d['children']:
        if c['name'] in seen:
            seen[c['name']]['children'] += c['children']
        else:
            seen[c['name']] = c
        deep_merge(c)
deep_merge(d)

d would become: d将变为:

{'children': [{'children': [],
               'name': 'class_2_1',
               'type': 'directory'},
              {'children': [{'children': [],
                             'name': 'class_3_1',
                             'type': 'directory'},
                            {'children': [],
                             'name': 'class_3_2',
                             'type': 'directory'}],
               'name': 'class_2_2',
               'type': 'directory'},
              {'children': [{'children': [],
                             'name': 'class_3_2',
                             'type': 'directory'}],
               'name': 'class_2_2',
               'type': 'directory'}],
 'name': 'class_1_1',
 'type': 'directory'}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM