简体   繁体   中英

Python itertools.groupby() in a recursive function

I am trying to iterate groups via itertools.groupby in a recursive function to construct nested dictionary from nested lists.

Input

example = [['a', [], 'b', (), 1, None],
           ['a', [], 'c', (), 0, None],
           ['a', [], 2, None, None, None],
           ['a', [], 3, None, None, None],
           ['a', [], 3, None, None, None],
           ]

Expected output

output = {'a': [{'b': (1, None)},
                {'c': (1, None)},
                2, None, None, None, 3, None, None,
                None, 3, None, None, None
                ]
          }

The code I am trying

from itertools import chain, groupby

def group_key(lst, level=0):
    return lst[level]

def build_dict(data=None, grouper=None):
    if grouper is None:
        gen = groupby(data, key=group_key)
    else:
        if any(isinstance(i, list) for i in grouper):
            level_down = [l[1:] for l in grouper]
            gen = groupby(level_down, key=group_key)
        else:
            return grouper

    for char, group in gen:
        group_lst = list(group)

        if isinstance(char, str):
            value = {char: build_dict(grouper=group_lst)}
        elif char == ():
            value = tuple(build_dict(grouper=group_lst))
        elif char == []:
            value = [build_dict(grouper=group_lst)]
        else:
            value = chain.from_iterable(group_lst)
        
        return value

When I submit the code I get only the first group of in the for char, group in gen: loop. Somehow the function does not continue with the other groups. I am not great in recursive functions so perhaps I am missing something there. This is what the code produces:

In: build_dict(example)
Out: {'a': [{'b': (1, None)}]}

The structure is a bit inconsistant as it presents dictionary content as a list of [key,collection,values...] at the top level but specifies sub-dictionaries without the enclosing list of lists. Despite having to work around this inconsistency, the data structure can be built recursively.

def buildData(content,asValues=False):
    if not asValues:    
        result = dict() # assumes a list of key, model, values...
        for k,model,*values in content:
            result.setdefault(k,model)
            result[k] += type(model)(buildData(values,True))
        return result
    if len(content)>2 \
    and isinstance(content[0],str) and isinstance(content[1],(tuple,list)):
        return [buildData([content])] # adapts to match top level structure  
    if content: # everythoing else produces a list of data items
        return content[:1] + buildData(content[1:],True)
    return [] # until data exhausted

output:

example = [['a', [], 'b', (), 1, None],
           ['a', [], 'c', (), 0, None],
           ['a', [], 2, None, None, None],
           ['a', [], 3, None, None, None],
           ['a', [], 3, None, None, None],
           ]
d = buildData(example)

print(d)
            
{'a': [{'b': (1, None)}, 
       {'c': (0, None)}, 
       2, None, None, None, 3, None, None, None, 3, None, None, None]}

restructure

This is not a problem for itertools.groupby . The logic you are using to "group" elements is unique and I would not expect to find a built-in function that meets your exact needs. Below I begin with restructure which takes each element from example and produces an output similar to the output you already have -

def restructure(t):
  def loop(t, r):
    if not t:
      return r[0]
    if t[-1] == ():
      return loop(t[0:-1], tuple(r))
    elif t[-1] == []:
      return loop(t[0:-1], list(r))
    elif isinstance(t[-1], str):
      return loop(t[0:-1], ({t[-1]: r},))
    else:
      return loop(t[0:-1], (t[-1], *r))
  return loop(t[0:-1], (t[-1],))
for e in example:
  print(restructure(e))
{'a': [{'b': (1, None)}]}
{'a': [{'c': (0, None)}]}
{'a': [2, None, None, None]}
{'a': [3, None, None, None]}
{'a': [3, None, None, None]}

merge

With each element restructured, we now define a way to merge restructured elements -

def merge(r, t):
  if isinstance(r, dict) and isinstance(t, dict):
    for (k,v) in t.items():
      r[k] = merge(r[k], v)
    return r
  elif isinstance(r, tuple) and isinstance(t, tuple):
    return r + t
  elif isinstance(r, list) and isinstance(t, list):
    return r + t
  else:
    return t
a = restructure(example[0])
b = restructure(example[1])
print(merge(a, b))
{'a': [{'b': (1, None)}, {'c': (0, None)}]}

build

Lastly, build is responsible to tying everything together -

def build(t):
  if not t:
    return None
  elif len(t) == 1:
    return restructure(t[0])
  else:
    return merge(restructure(t[0]), build(t[1:]))
example = \
  [ ['a', [], 'b', (), 1, None]
  , ['a', [], 'c', (), 0, None]
  , ['a', [], 2, None, None, None]
  , ['a', [], 3, None, None, None]
  , ['a', [], 3, None, None, None]
  ]

print(build(example))
{'a': [{'b': (1, None)}, {'c': (0, None)}, 2, None, None, None, 3, None, None, None, 3, None, None, None]}

Above, build is effectively the same as functools.reduce and map -

from functools import reduce

def build(t):
  if not t:
    return None
  else:
    return reduce(merge, map(restructure, t))
print(build(example))
{'a': [{'b': (1, None)}, {'c': (0, None)}, 2, None, None, None, 3, None, None, None, 3, None, None, None]}

caveat

This answer does nothing to protect against invalid inputs. You are responsible for verifying inputs are valid -

restructure([])                     # IndexError
restructure([[], "a"])              # a
restructure(["a", (), [], "b", ()]) # {'a': ({'b': ((),)},)}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM