I need to parse the flatten structure and create nested structure using the list of keys provided. I have solved the problem but I am looking for an improvement and I would like to learn what I can change in my code. Can somebody review it and refactor using better knowledge?
src_data = [
{
"key1": "XX",
"key2": "X111",
"key3": "1aa",
"key4": 1
},
{
"key1": "YY",
"key2": "Y111",
"key3": "1bb",
"key4": 11
},
{
"key1": "ZZ",
"key2": "Z111",
"key3": "1cc",
"key4": 2.4
},
{
"key1": "AA",
"key2": "A111",
"key3": "1cc",
"key4": 33333.2122
},
{
"key1": "BB",
"key2": "B111",
"key3": "1bb",
"key4": 2
},
]
this is my code I developed so far creating the final result.
def plant_tree(ll):
master_tree = {}
for i in ll:
tree = master_tree
for n in i:
if n not in tree:
tree[n] = {}
tree = tree[n]
return master_tree
def make_nested_object(tt, var):
elo = lambda l: reduce(lambda x, y: {y: x}, l[::-1], var)
return {'n_path': tt, 'n_structure': elo(tt)}
def getFromDict(dataDict, mapList):
return reduce(operator.getitem, mapList, dataDict)
def set_nested_item(dataDict, mapList, val):
"""Set item in nested dictionary"""
reduce(getitem, mapList[:-1], dataDict)[mapList[-1]] = val
return dataDict
def update_tree(data_tree):
# MAKE NESTED OBJECT
out = (make_nested_object(k, v) for k,v, in res_out.items())
for dd in out:
leaf_data = dd['n_structure']
leaf_path = dd['n_path']
data_tree = set_nested_item(data_tree, leaf_path, getFromDict(leaf_data, leaf_path))
return data_tree
this is the customed itemgeter function from this question
def customed_itemgetter(*args):
# this handles the case when one key is provided
f = itemgetter(*args)
if len(args) > 2:
return f
return lambda obj: (f(obj),)
define the nesting level
nesting_keys = ['key1', 'key3', 'key2']
grouper = customed_itemgetter(*nesting_keys)
ii = groupby(sorted(src_data, key=grouper), grouper)
res_out = {key: [{k:v for k,v in i.items() if k not in nesting_keys} for i in group] for key,group in ii}
#
ll = ([dd[x] for x in nesting_keys] for dd in src_data)
data_tree = plant_tree(ll)
get results
result = update_tree(data_tree)
How can I improve my code?
If the itemgetter
[Python-doc] is given a single element, it returns that single element, and does not wrap it in a singleton-tuple.
We can however construct a function for that, like:
from operator import itemgetter
def itemgetter2(*args):
f = itemgetter(*args)
if len(args) > 2:
return f
return lambda obj: (f(obj),)
then we can thus use the new itemgetter2
, like:
grouper = *ll
ii = groupby(sorted(src_data, key=grouper), grouper)
EDIT : Based on your question however, you want to perform multilevel grouping, we can make a function for that, like:
def multigroup(groups, iterable, index=0):
if len(groups) <= index:
return list(iterable)
else:
f = itemgetter(groups[index])
i1 = index + 1
return {
k: multigroup(groups, vs, index=i1)
for k, vs in groupby(sorted(iterable, key=f), f)
}
For the data_src
in the question, this then generates:
>>> multigroup(['a', 'b'], src_data)
{1: {2: [{'a': 1, 'b': 2, 'z': 3}]}, 2: {3: [{'a': 2, 'b': 3, 'e': 2}]}, 4: {3: [{'a': 4, 'x': 3, 'b': 3}]}}
You can post-process the values in the list(..)
call however. We can for example generate dictionaries without the elements in the grouping columns:
def multigroup(groups, iterable):
group_set = set(groups)
fs = [itemgetter(group) for group in groups]
def mg(iterable, index=0):
if len(groups) <= index:
return [
{k: v for k, v in item.items() if k not in group_set}
for item in iterable
]
else:
i1 = index + 1
return {
k: mg(vs, index=i1)
for k, vs in groupby(sorted(iterable, key=fs[index]), fs[index])
}
return mg(iterable)
For the given sample input, we get:
>>> multigroup(['a', 'b'], src_data)
{1: {2: [{'z': 3}]}, 2: {3: [{'e': 2}]}, 4: {3: [{'x': 3}]}}
or for the new sample data:
>>> pprint(multigroup(['key1', 'key3', 'key2'], src_data))
{'AA': {'1cc': {'A111': [{'key4': 33333.2122}]}},
'BB': {'1bb': {'B111': [{'key4': 2}]}},
'XX': {'1aa': {'X111': [{'key4': 1}]}},
'YY': {'1bb': {'Y111': [{'key4': 11}]}},
'ZZ': {'1cc': {'Z111': [{'key4': 2.4}]}}}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.