将带有列表的嵌套字典展开到pandas DataFrame中

Question

I have a nested dictionary, whereby the sub-dictionary use lists: 我有一个嵌套字典，子字典使用列表：

nested_dict = {'string1': {69: [1231, 232], 67:[682, 12], 65: [1, 1]}, 
    `string2` :{28672: [82, 23], 22736:[82, 93, 1102, 102], 19423: [64, 23]}, ... }

There are at least two elements in the list for the sub-dictionaries, but there could be more. 列表中至少有两个元素用于子词典，但可能还有更多。

I would like to "unfold" this dictionary into a pandas DataFrame, with one column for the first dictionary keys (eg 'string1', 'string2', ..), one column for the sub-directory keys, one column for the first item in the list, one column for the next item, and so on. 我想将这个字典“展开”成一个pandas DataFrame，第一个字典键有一列（例如'string1'，'string2'，..），一个列用于子目录键，一列用于第一个字典键列表中的项目，下一个项目的一列，依此类推。

Here is what the output should look like: 这是输出应该是什么样子：

col1       col2    col3     col4    col5    col6
string1    69      1231     232
string1    67      682      12
string1    65      1        1
string2    28672   82       23
string2    22736   82       93      1102    102
string2    19423   64       23

Naturally, I try to use pd.DataFrame.from_dict : 当然，我尝试使用pd.DataFrame.from_dict ：

new_df = pd.DataFrame.from_dict({(i,j): nested_dict[i][j] 
                           for i in nested_dict.keys() 
                           for j in nested_dict[i].keys()
                           ...

Now I'm stuck. 现在我被卡住了。 And there are many existing problems: 并且存在许多问题：

How do I parse the strings (ie the nested_dict[i].values() ) such that each element is a new pandas DataFrame column? 我如何解析字符串（即nested_dict[i].values() ），使每个元素都是一个新的pandas DataFrame列？
The above will actually not create a column for each field 以上实际上不会为每个字段创建一列
The above will not fill up the columns with elements, eg string1 should be in each row for the sub-directory key-value pair. 以上内容不会用元素填充列，例如， string1应该在子目录键值对的每一行中。 (For col5 and col6 , I can fill the NA with zeros) （对于col5和col6 ，我可以用零填充NA）
I'm not sure how to name these columns correctly. 我不确定如何正确命名这些列。

Answer 1

Here's a method which uses a recursive generator to unroll the nested dictionaries. 这是一个使用递归生成器展开嵌套字典的方法。 It won't assume that you have exactly two levels, but continues unrolling each dict until it hits a list . 它不会假设您有两个级别，但会继续展开每个dict直到它到达list 。

nested_dict = {
    'string1': {69: [1231, 232], 67:[682, 12], 65: [1, 1]}, 
    'string2' :{28672: [82, 23], 22736:[82, 93, 1102, 102], 19423: [64, 23]},
    'string3': [101, 102]}

def unroll(data):
    if isinstance(data, dict):
        for key, value in data.items():
            # Recursively unroll the next level and prepend the key to each row.
            for row in unroll(value):
                yield [key] + row
    if isinstance(data, list):
        # This is the bottom of the structure (defines exactly one row).
        yield data

df = pd.DataFrame(list(unroll(nested_dict)))

Because unroll produces a list of lists rather than dicts, the columns will be named numerically (from 0 to 5 in this case). 因为unroll会生成列表而不是dicts，所以列将以数字命名（在本例中为0到5）。 So you need to use rename to get the column labels you want: 因此，您需要使用rename来获取所需的列标签：

df.rename(columns=lambda i: 'col{}'.format(i+1))

This returns the following result (note that the additional string3 entry is also unrolled). 这将返回以下结果（请注意，附加的string3条目也将展开）。

      col1   col2  col3   col4    col5   col6
0  string1     69  1231  232.0     NaN    NaN
1  string1     67   682   12.0     NaN    NaN
2  string1     65     1    1.0     NaN    NaN
3  string2  28672    82   23.0     NaN    NaN
4  string2  22736    82   93.0  1102.0  102.0
5  string2  19423    64   23.0     NaN    NaN
6  string3    101   102    NaN     NaN    NaN

Answer 2

This should give you the result you are looking for, although it's probably not the most elegant solution. 这应该会给你你想要的结果，虽然它可能不是最优雅的解决方案。 There's probably a better (more pandas way) to do it. 这可能是更好的（更多的pandas方式）。

I parsed your nested dict and built a list of dictionaries (one for each row). 我解析了你的嵌套字典并构建了一个字典列表（每行一个）。

# some sample input
nested_dict = {
    'string1': {69: [1231, 232], 67:[682, 12], 65: [1, 1]}, 
    'string2' :{28672: [82, 23], 22736:[82, 93, 1102, 102], 19423: [64, 23]},
    'string3' :{28673: [83, 24], 22737:[83, 94, 1103, 103], 19424: [65, 24]}
}

# new list is what we will use to hold each row
new_list = []
for k1 in nested_dict:
    curr_dict = nested_dict[k1]
    for k2 in curr_dict:
        new_dict = {'col1': k1, 'col2': k2}
        new_dict.update({'col%d'%(i+3): curr_dict[k2][i] for i in range(len(curr_dict[k2]))})
        new_list.append(new_dict)

# create a DataFrame from new list
df = pd.DataFrame(new_list)

The output: 输出：

      col1   col2  col3  col4    col5   col6
0  string2  28672    82    23     NaN    NaN
1  string2  22736    82    93  1102.0  102.0
2  string2  19423    64    23     NaN    NaN
3  string3  19424    65    24     NaN    NaN
4  string3  28673    83    24     NaN    NaN
5  string3  22737    83    94  1103.0  103.0
6  string1     65     1     1     NaN    NaN
7  string1     67   682    12     NaN    NaN
8  string1     69  1231   232     NaN    NaN

There is an assumption that the input will always contain enough data to create a col1 and a col2 . 假设输入将始终包含足够的数据来创建col1和col2 。

I loop through nested_dict . 我遍历nested_dict 。 It is assumed that each element of nested_dict is also a dictionary. 假设nested_dict每个元素也是字典。 We loop through that dictionary as well ( curr_dict ). 我们也循环遍历该字典（ curr_dict ）。 The keys k1 and k2 are used to populate col1 and col2 . 键k1和k2用于填充col1和col2 。 For the rest of the keys, we iterate through the list contents and add a column for each element. 对于其余的键，我们遍历列表内容并为每个元素添加一列。

将带有列表的嵌套字典展开到pandas DataFrame中

问题描述

2 个解决方案

解决方案1
3 2017-12-15 22:26:18

解决方案2
1 已采纳 2017-12-15 21:53:03

将带有列表的嵌套字典展开到pandas DataFrame中

问题描述

2 个解决方案

解决方案1 3 2017-12-15 22:26:18

解决方案2 1 已采纳 2017-12-15 21:53:03

解决方案1
3 2017-12-15 22:26:18

解决方案2
1 已采纳 2017-12-15 21:53:03