简体   繁体   English

将5级字典(以pd.Series作为值)转换为pandas DataFrame

[英]Convert a 5 level dictionary (with pd.Series as value into a pandas DataFrame

Original question: I am using python 3. I have some 4 level dictionary and 5 level dictionary. 原始问题:我正在使用python3。我有一些4级字典和5级字典。 I want to convert this multilevel dictionary into a pandas DataFrame with a recursive function 我想将此多级字典转换为具有递归函数的pandas DataFrame

To simplified my question and test my function, I generated a 3 level dictionary as shown below and to try my recursive function. 为了简化我的问题并测试我的函数,我生成了如下所示的3级字典并尝试了递归函数。 I understand that with this 3 levels nested dictionary, there are many other ways to solve the problem. 我了解到,使用这3个级别的嵌套字典,还有许多其他方法可以解决问题。 But, I feel only recursive function can be easily applied to solve the problem on 4 levels, 5 levels or more levels dictionary 但是,我觉得只有递归函数才能轻松地解决4层,5层或更多层字典上的问题

To create a simplified 3-level dictionary: 要创建简化的三级词典:


from collections import defaultdict
def ddict():
    return defaultdict(ddict)

tree = ddict()
tree['level1_1']['level2_1']['level3_1'] = <pd.Series1>
tree['level1_1']['level2_1']['level3_2'] = <pd.Series2>
tree['level1_1']['level2_2']['level3_1'] = <pd.Series3>
tree['level1_1']['level2_2']['level3_2'] = <pd.Series4>
tree['level1_2']['level2_1']['level3_1'] = <pd.Series5>
tree['level1_2']['level2_1']['level3_2'] = <pd.Series6>
tree['level1_2']['level2_2']['level3_1'] = <pd.Series7>
tree['level1_2']['level2_2']['level3_2'] = <pd.Series8>

Inspired by Bart Cubrich below, I revised xx's code and put my solution here 受以下Bart Cubrich的启发,我修改了xx的代码并将解决方案放在此处

import collections
def tree2df (d, colname):
    """
    Inputs:
        1. d               (a nested dict, or a tree, all values are pd.Series)
        2. colname         (a list)

    Return:
        1. a pd.DataFrame  
    """
    def flatten(d, parent_key='', sep='-'):
        items = []
        for k, v in d.items():
            new_key = str(parent_key) + str(sep) + str(k) if parent_key else k
            if isinstance(v, collections.MutableMapping):
                items.extend(flatten(v, new_key, sep=sep).items())
            else:
                items.append((new_key, v))
        return dict(items)
    flat_dict = flatten (d)  
    levels, vals = zip(*[(tuple(level.split('-')),val) for level, val in flat_dict.items()])
    max_level = np.max(np.array([len(l) for l in levels]))
    if len(colname) != max_level:
        print ("The numbers of column name is invalid because of moer than maximum level: %s.\nNothing will be returned. Please revise the colname!"%max_level)
    else:
        colname += ['Old index']
        s = pd.concat(list(vals), keys = list(levels), names = colname)
        s = pd.DataFrame(s)
        s.reset_index(inplace=True)
        s.rename(columns={0:'Value'},inplace=True)
        return s

#Example
BlockEvent_TS_df = tree2df (BlockEvent_TS_tree, ['ID','Session','Trial type','Block', 'Event name'])

The 5-level nested dictionary is in the same idea as 3-level one: 5级嵌套字典与3级嵌套字典具有相同的概念:

tree['level1_1']['level2_1']['level3_1']['level4_1']['level5_1'] = <pd.Series1>
                ...
tree['level1_2']['level2_2']['level3_2']['level4_2']['level5_2'] = <pd.Series32>

Because I have a large dataset, so it's very complicated to show the whole nested dictionary here. 因为我有一个很大的数据集,所以在这里显示整个嵌套字典非常复杂。 But, the idea is like this. 但是,想法是这样的。 And later on, I want to have 6 col, 5 col to store each level and one column is for value. 然后,我想有6列,5列来存储每个级别,并且一列表示价值。

I've tried the code above and it works well for me. 我已经尝试了上面的代码,并且对我来说效果很好。 The speed is also very decent. 速度也很不错。

Thanks for all your help! 感谢你的帮助!

You need: 你需要:

format_ = {(level1_key, level2_key, level3_key): values
 for level1_key, level2_dict in tree.items()
 for level2_key, level3_dict in level2_dict.items()
 for level3_key, values      in level3_dict.items()}
df = pd.DataFrame(format_, index=['Value']).T.reset_index()

Output: 输出:

     level_0    level_1      level_2    Value
0   level1_1    level2_1    level3_1    1
1   level1_1    level2_1    level3_2    2
2   level1_1    level2_2    level3_1    3
3   level1_1    level2_2    level3_2    4
4   level1_2    level2_1    level3_1    5
5   level1_2    level2_1    level3_2    6
6   level1_2    level2_2    level3_1    7
7   level1_2    level2_2    level3_2    8

So my solution would be to traverse the tree looking at all the keys and building out each elements path as an array then create the DataFrame from records. 因此,我的解决方案是遍历树,查看所有键,并将每个元素路径构建为数组,然后从记录创建DataFrame。 I split each of these steps into their own method. 我将这些步骤分为各自的方法。

There may be a more efficient approach, but this should get the job done. 可能有一种更有效的方法,但这应该可以完成工作。 Hope this helps. 希望这可以帮助。

def traverse_tree(d, prefix='', results=[]):
    if type(d) is int:
        record = str(prefix).split(',')
        record.append(d)
        results.append(record)
        return results
    keys = d.keys()
    for key in keys:
        temp = prefix + ',' if prefix != '' else ''
        results = traverse_tree(d[key], temp + str(key), results)
    return results


def dict_to_df(d):
    res = traverse_tree(tree)
    labels = []
    for i in range(len(res[0]) - 1):
        labels.append('L' + str(i+1))
    labels.append('Value')
    print(res)
    print(labels)
    return pd.DataFrame.from_records(res, columns=labels)


if __name__ == '__main__':
    tree = ddict()
    tree['level1_1']['level2_1']['level3_1'] = 1
    tree['level1_1']['level2_1']['level3_2'] = 2
    tree['level1_1']['level2_2']['level3_1'] = 3
    tree['level1_1']['level2_2']['level3_2'] = 4
    tree['level1_2']['level2_1']['level3_1'] = 5
    tree['level1_2']['level2_1']['level3_2'] = 6
    tree['level1_2']['level2_2']['level3_1'] = 7
    tree['level1_2']['level2_2']['level3_2'] = 8
    df = dict_to_df(tree)
    print(df)

This version will work when there are various level depths, though it is messy looking. 尽管看起来有些混乱,但该版本在各种级别的深度下都可以使用。

import pandas as pd

from collections import defaultdict
def ddict():
    return defaultdict(ddict)

tree = ddict()
tree['level1_1']['level2_1']['level3_1'] = 1
tree['level1_1']['level2_1']['level3_2'] = 2
tree['level1_1']['level2_2']['level3_1'] = 3
tree['level1_1']['level2_2']['level3_2'] = 4
tree['level1_2']['level2_1']['level3_1'] = 5
tree['level1_2']['level2_1']['level3_2'] = 6
tree['level1_2']['level2_2']['level3_1'] = 7
tree['level1_2']['level2_2']['level3_2']['Level4_1'] = 8

import collections

def flatten(d, parent_key='', sep='-'):
    items = []
    for k, v in d.items():
        new_key = parent_key + sep + k if parent_key else k
        if isinstance(v, collections.MutableMapping):
            items.extend(flatten(v, new_key, sep=sep).items())
        else:
            items.append((new_key, v))
    return dict(items)

flat_dict=flatten(tree)

#df=pd.DataFrame()
levels=[]
vals=[]
for key in flat_dict.keys():
    levels.append(key.split('-'))
    vals.append(flat_dict.get(key))



max_level=0
for level in levels:
    if len(level)>max_level: max_level=len(level)

df=pd.DataFrame(columns=range(max_level+1))

index=0

for level,val in zip(levels,vals):
    for i in range(max_level):
        try: 
            level[i]
            df.loc[index,i]=level[i]
        except IndexError:
            print('means this level has less than max')

        df.loc[index,max_level]=val

    index+=1

df

Out:

          0         1         2         3  4
0  level1_1  level2_1  level3_1       NaN  1
1  level1_1  level2_1  level3_2       NaN  2
2  level1_1  level2_2  level3_1       NaN  3
3  level1_1  level2_2  level3_2       NaN  4
4  level1_2  level2_1  level3_1       NaN  5
5  level1_2  level2_1  level3_2       NaN  6
6  level1_2  level2_2  level3_1       NaN  7
7  level1_2  level2_2  level3_2  Level4_1  8

I got the flatten idea from Here 我从这里得到了扁平化的想法

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM