简体   繁体   中英

python itertools groupby return tuple

I need to parse the flatten structure and create nested structure using the list of keys provided. I have solved the problem but I am looking for an improvement and I would like to learn what I can change in my code. Can somebody review it and refactor using better knowledge?

src_data = [
  {
    "key1": "XX",
    "key2": "X111",
    "key3": "1aa",
    "key4": 1
  },
  {
    "key1": "YY",
    "key2": "Y111",
    "key3": "1bb",
    "key4": 11
  },
  {
    "key1": "ZZ",
    "key2": "Z111",
    "key3": "1cc",
    "key4": 2.4
  },
  {
    "key1": "AA",
    "key2": "A111",
    "key3": "1cc",
    "key4": 33333.2122
  },
  {
    "key1": "BB",
    "key2": "B111",
    "key3": "1bb",
    "key4": 2
  },
]

this is my code I developed so far creating the final result.

def plant_tree(ll):
    master_tree = {}

    for i in ll:
        tree = master_tree
        for n in i:
            if n not in tree:
                tree[n] = {}
            tree = tree[n]
    return master_tree



def make_nested_object(tt, var):
    elo = lambda l: reduce(lambda x, y: {y: x}, l[::-1], var)
    return {'n_path': tt, 'n_structure': elo(tt)}



def getFromDict(dataDict, mapList):
    return reduce(operator.getitem, mapList, dataDict)


def set_nested_item(dataDict, mapList, val):
    """Set item in nested dictionary"""
    reduce(getitem, mapList[:-1], dataDict)[mapList[-1]] = val
    return dataDict



def update_tree(data_tree):
    # MAKE NESTED OBJECT
    out = (make_nested_object(k, v) for k,v, in res_out.items())


    for dd in out:
        leaf_data = dd['n_structure']
        leaf_path = dd['n_path']
        data_tree = set_nested_item(data_tree, leaf_path, getFromDict(leaf_data, leaf_path))
    return data_tree

this is the customed itemgeter function from this question

def customed_itemgetter(*args):
    # this handles the case when one key is provided
    f = itemgetter(*args)
    if len(args) > 2:
        return f
    return lambda obj: (f(obj),)

define the nesting level

nesting_keys = ['key1', 'key3', 'key2']

grouper = customed_itemgetter(*nesting_keys)
ii = groupby(sorted(src_data, key=grouper), grouper)

res_out = {key: [{k:v for k,v in i.items() if k not in nesting_keys} for i in group] for key,group in ii}
#
ll = ([dd[x] for x in nesting_keys] for dd in src_data)
data_tree = plant_tree(ll)

get results

result = update_tree(data_tree)

How can I improve my code?

If the itemgetter [Python-doc] is given a single element, it returns that single element, and does not wrap it in a singleton-tuple.

We can however construct a function for that, like:

from operator import itemgetter

def itemgetter2(*args):
    f = itemgetter(*args)
    if len(args) > 2:
        return f
    return lambda obj: (f(obj),)

then we can thus use the new itemgetter2 , like:

grouper = *ll
ii = groupby(sorted(src_data, key=grouper), grouper)

EDIT : Based on your question however, you want to perform multilevel grouping, we can make a function for that, like:

def multigroup(groups, iterable, index=0):
    if len(groups) <= index:
        return list(iterable)
    else:
        f = itemgetter(groups[index])
        i1 = index + 1
        return {
            k: multigroup(groups, vs, index=i1)
            for k, vs in groupby(sorted(iterable, key=f), f)
        }

For the data_src in the question, this then generates:

>>> multigroup(['a', 'b'], src_data)
{1: {2: [{'a': 1, 'b': 2, 'z': 3}]}, 2: {3: [{'a': 2, 'b': 3, 'e': 2}]}, 4: {3: [{'a': 4, 'x': 3, 'b': 3}]}}

You can post-process the values in the list(..) call however. We can for example generate dictionaries without the elements in the grouping columns:

def multigroup(groups, iterable):
    group_set = set(groups)
    fs = [itemgetter(group) for group in groups]
    def mg(iterable, index=0):
        if len(groups) <= index:
            return [
                {k: v for k, v in item.items() if k not in group_set}
                for item in iterable
            ]
        else:
            i1 = index + 1
            return {
                k: mg(vs, index=i1)
                for k, vs in groupby(sorted(iterable, key=fs[index]), fs[index])
            }
    return mg(iterable)

For the given sample input, we get:

>>> multigroup(['a', 'b'], src_data)
{1: {2: [{'z': 3}]}, 2: {3: [{'e': 2}]}, 4: {3: [{'x': 3}]}}

or for the new sample data:

>>> pprint(multigroup(['key1', 'key3', 'key2'], src_data))
{'AA': {'1cc': {'A111': [{'key4': 33333.2122}]}},
 'BB': {'1bb': {'B111': [{'key4': 2}]}},
 'XX': {'1aa': {'X111': [{'key4': 1}]}},
 'YY': {'1bb': {'Y111': [{'key4': 11}]}},
 'ZZ': {'1cc': {'Z111': [{'key4': 2.4}]}}}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM