简体   繁体   English

按子级对dict排序的10k数据集排序

[英]Sorting 10k data set to the dict by child basis

I have large data collection 10k objects. 我有大量的数据收集10k对象。 I want to sort it to the dict on the following way. 我想按以下方式将其排序为字典。

{
'code': obj.code, 'childs':[{
    'code': obj.code, 'childs':[{
        'code':obj.code, 'code':obj.code}] # no childs here
    }]
}

obj.code is 8 character string written as number obj.code是写为数字的8个字符串

'01000000',
'01100000',
'01200000',
'21000000',
'21121200',

First two characters with 6 zero are 'root' parents so '01000000' and '21000000' are root parents. 前6个零的两个字符是“根”父母,因此“ 01000000”和“ 21000000”是根父母。 Then '01100000', and '01400000' are first level child's of '01' parent. 然后,“ 01100000”和“ 01400000”是“ 01”父级的第一级子级。 Every parent can have 9 child's max. 每个父母最多可容纳9个孩子。 So tree looks like this 所以树看起来像这样

01000000
    01100000
        01110000
            01111000
                01111100
                    01111110
                        01111111
                        01111112
                        01111113
                        01111114
                        01111115
                    01111120
                        01111121
                        01111122
                        01111123
                        01111124
                    01111130
                        01111131
                        01111132
                        01111133
                        01111134
                    01111140
                        01111141
                        01111142
                        01111143
                        01111144

I'm not sure from where to start, so any hint is much appreciated. 我不确定从哪里开始,所以任何提示都将不胜感激。 Root parents can be found on this way. 可以通过这种方式找到根父母。

def mySort(myQuerySet):
     root_parents = myQuerySet.objects(code__icontain='000000')

Here's one possible solution. 这是一种可能的解决方案。 It iterates through the codes once to build a simple tree, then afterward turns that tree into the kind that you requested. 它遍历代码一次以构建一棵简单的树,然后将其变成您所请求的树。

import re
from pprint import pprint
from collections import defaultdict

def build_tree(codes):
    """Build the tree from a list of codes (strings)"""

    # tree is a dictionary that maps each code to a list of codes of children.
    tree = defaultdict(list)
    roots = []
    for code in codes:
        if '000000' in code:
            tree[code] = []
            roots.append(code)
        else:
            nonzero = re.search(r'[1-9]0*$', code).start()
            parent = code[:nonzero] + '0' + code[1 + nonzero:]
            tree[parent].append(code)

    # sort children (optional)
    for v in tree.values():
        v.sort()

    # convert original dictionary to one in the desired form.
    def convert(old_parent):
        result = {}
        result['code'] = old_parent
        if len(tree[old_parent]) > 0:
            result['children'] = [convert(c) for c in tree[old_parent]]
        return result

    return [convert(root) for root in roots]

codes = ["01000000", "01100000", "01110000", "01111000", "01111100", "01111110",
         "01111111", "01111112", "01111113", "01111114", "01111115", "01111120",
         "01111121", "01111122", "01111123", "01111124", "01111130", "01111131",
         "01111132", "01111133", "01111134", "01111140", "01111141", "01111142",
         "01111143", "01111144"]

pprint(build_tree(codes))

Here is the output (excuse the formatting) 这是输出(请格式化)

[{'children': [{'children': [{'children': [{'children': [{'children': [{'children': [{'code': '01111111'},
                                                                                     {'code': '01111112'},
                                                                                     {'code': '01111113'},
                                                                                     {'code': '01111114'},
                                                                                     {'code': '01111115'}],
                                                                        'code': '01111110'},
                                                                       {'children': [{'code': '01111121'},
                                                                                     {'code': '01111122'},
                                                                                     {'code': '01111123'},
                                                                                     {'code': '01111124'}],
                                                                        'code': '01111120'},
                                                                       {'children': [{'code': '01111131'},
                                                                                     {'code': '01111132'},
                                                                                     {'code': '01111133'},
                                                                                     {'code': '01111134'}],
                                                                        'code': '01111130'},
                                                                       {'children': [{'code': '01111141'},
                                                                                     {'code': '01111142'},
                                                                                     {'code': '01111143'},
                                                                                     {'code': '01111144'}],
                                                                        'code': '01111140'}],
                                                          'code': '01111100'}],
                                            'code': '01111000'}],
                              'code': '01110000'}],
                'code': '01100000'}],
  'code': '01000000'}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 DBSCAN 集群甚至无法处理 40k 数据,但使用 python 和 sklearn 处理 10k 数据 - DBSCAN clustering is not working even on 40k data but working on 10k data using python and sklearn 使用 Seaborn 为 x 轴绘制超过 10K 的数据点作为时间戳 - Plotting more than 10K data point using Seaborn for x-axis as timestamp 每天只运行 10k 次请求,第二天再运行 10k 次,依此类推 - Run only 10k requests per day and next day another 10k and so on 文本抓取(来自 EDGAR 10K Amazon)代码不起作用 - Text Scraping (from EDGAR 10K Amazon) code not working 从数据库中提取的10k记录的散点图 - Scatter plot of 10k record extracted from database 如何在Django中一次性上传10k电影的图像? - How to upload images of 10k movies in one go in Django? heroku:“写访问权被吊销”(限制为10k)并且连接被拒绝 - heroku: “Write Access Revoked” (10k limit) and Connection Refused 如何在Google Colab中包含MSRA 10K数据集? - How to include the MSRA 10K dataset in google Colab? 如何在 pandas dataframe 中创建 10k 条记录? - How to create 10k records in a pandas dataframe? 如何在matplotlib plot的xlabel中打印10K、20K....1M - How to print 10K, 20K....1M in the xlabel of matplotlib plot
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM