简体   繁体   中英

Convert pandas df to nested dictionary

I have a requirement to convert a df that is in following format:

d = {
    'A': ['a1', 'a1', 'a1', 'a1', 'a1', 'a1', 'a1', 'a2', 'a2', 'a2', 'a2', 'a2', 'a2', 'a2', 'a2'],
    'B': ['b1', 'b1', 'b1', 'b1', 'b2', 'b2', 'b2', 'b3', 'b3', 'b3', 'b3', 'b3', 'b3', 'b4', 'b4', ],
    'C': ['c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7', 'c8', 'c9', 'c10', 'c11', 'c12', 'c13', 'c14', 'c15', ],
    'D': ['d1', 'd2', 'd3', 'd4', 'd5', 'd6', 'd7', 'd8', 'd9', 'd10', 'd11', 'd12', 'd13', 'd14', 'd15', ],
    'E': ['e1', 'e2', 'e3', 'e4', 'e5', 'e6', 'e7', 'e8', 'e9', 'e10', 'e11', 'e12', 'e13', 'e14', 'e15', ],
}

df = pd.DataFrame(d)
df

A   B   C   D   E
a1  b1  c1  d1  e1
a1  b1  c2  d2  e2
a1  b1  c3  d3  e3
a1  b1  c4  d4  e4
a1  b2  c5  d5  e5
a1  b2  c6  d6  e6
a1  b2  c7  d7  e7
a2  b3  c8  d8  e8
a2  b3  c9  d9  e9
a2  b3  c10 d10 e10
a2  b3  c11 d11 e11
a2  b3  c12 d12 e12
a2  b3  c13 d13 e13
a2  b4  c14 d14 e14
a2  b4  c15 d15 e15

to a dictionary in following format:

outDict = {
    'a1': {
        'b1': {
            'c': ['c1', 'c2', 'c3', 'c4'],
            'd': ['d1', 'd2', 'd3', 'd4'],
            'e': ['e1', 'e2', 'e3', 'e4'],
        },
        'b2': {
            'c': ['c5', 'c6', 'c7'],
            'd': ['d5', 'd6', 'd7'],
            'e': ['e5', 'e6', 'e7'],
        },
    },
    'a2': {
        'b3': {
            'c': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'],
            'd': ['d8', 'd9', 'd10', 'd11', 'd12', 'd13'],
            'e': ['e8', 'e9', 'e10', 'e11', 'e12', 'e13'],
        },
        'b4': {
            'c': ['c14', 'c15'],
            'd': ['d14', 'd15'],
            'e': ['e14', 'e15'],
        }
    }
}

ie convert values in column A to first level keys; values in column B to second level keys and values in column C,D,E to lists.

First create nested lists by convert A, B to index, group by index values and all columns convert to list s in lambda function, last convert Series with MultiIndex to nested dictionary:

df = (df.set_index(['A', 'B'])
        .groupby(['A', 'B'])
        .apply(lambda x: x.to_dict(orient='list')))

d = {level: df.xs(level).to_dict() for level in df.index.levels[0]}

print (d)

{
    'a1': {
        'b1': {
            'C': ['c1', 'c2', 'c3', 'c4'],
            'D': ['d1', 'd2', 'd3', 'd4'],
            'E': ['e1', 'e2', 'e3', 'e4']
        },
        'b2': {
            'C': ['c5', 'c6', 'c7'],
            'D': ['d5', 'd6', 'd7'],
            'E': ['e5', 'e6', 'e7']
        }
    },
    'a2': {
        'b3': {
            'C': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'],
            'D': ['d8', 'd9', 'd10', 'd11', 'd12', 'd13'],
            'E': ['e8', 'e9', 'e10', 'e11', 'e12', 'e13']
        },
        'b4': {
            'C': ['c14', 'c15'],
            'D': ['d14', 'd15'],
            'E': ['e14', 'e15']
        }
    }
}

If need nested keys in lowercases only rename columns:

df = df.rename(columns={'C':'c', 'D':'d', 'E':'e'})
    
df = (df.set_index(['A', 'B'])
        .groupby(['A', 'B'])
        .apply(lambda x: x.to_dict(orient='list')))

d = {level: df.xs(level).to_dict() for level in df.index.levels[0]}

print (d)

{
    'a1': {
        'b1': {
            'c': ['c1', 'c2', 'c3', 'c4'],
            'd': ['d1', 'd2', 'd3', 'd4'],
            'e': ['e1', 'e2', 'e3', 'e4']
        },
        'b2': {
            'c': ['c5', 'c6', 'c7'],
            'd': ['d5', 'd6', 'd7'],
            'e': ['e5', 'e6', 'e7']
        }
    },
    'a2': {
        'b3': {
            'c': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'],
            'd': ['d8', 'd9', 'd10', 'd11', 'd12', 'd13'],
            'e': ['e8', 'e9', 'e10', 'e11', 'e12', 'e13']
        },
        'b4': {
            'c': ['c14', 'c15'],
            'd': ['d14', 'd15'],
            'e': ['e14', 'e15']
        }
    }
}

If the purpose is just to access by keys in A and B, I actually think the output from a group by with tuple keys is sufficient.

d = df.groupby(['A','B']).agg({'C':list,'D':list,'E':list}).to_dict(orient = 'index')

d
>>
{('a1', 'b1'): {'C': ['c1', 'c2', 'c3', 'c4'],
                'D': ['d1', 'd2', 'd3', 'd4'],
                'E': ['e1', 'e2', 'e3', 'e4']},
 ('a1', 'b2'): {'C': ['c5', 'c6', 'c7'],
                'D': ['d5', 'd6', 'd7'],
                'E': ['e5', 'e6', 'e7']},
 ('a2', 'b3'): {'C': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'],
                'D': ['d8', 'd9', 'd10', 'd11', 'd12', 'd13'],
                'E': ['e8', 'e9', 'e10', 'e11', 'e12', 'e13']},
 ('a2', 'b4'): {'C': ['c14', 'c15'], 'D': ['d14', 'd15'], 'E': ['e14', 'e15']}}

I prefer this way because is is less nested, however, you can easily rebuild the dict to fit your requirements:

new_dic = {}

for k,v in d.items():
    new_dic.setdefault(k[0],{})
    new_dic[k[0]] .update({k[1]:{_k.lower():_v for _k,_v in v.items()}})

new_dict
>>
{'a1': {'b1': {'c': ['c1', 'c2', 'c3', 'c4'],
               'd': ['d1', 'd2', 'd3', 'd4'],
               'e': ['e1', 'e2', 'e3', 'e4']},
        'b2': {'c': ['c5', 'c6', 'c7'],
               'd': ['d5', 'd6', 'd7'],
               'e': ['e5', 'e6', 'e7']}},
 'a2': {'b3': {'c': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'],
               'd': ['d8', 'd9', 'd10', 'd11', 'd12', 'd13'],
               'e': ['e8', 'e9', 'e10', 'e11', 'e12', 'e13']},
        'b4': {'c': ['c14', 'c15'], 'd': ['d14', 'd15'], 'e': ['e14', 'e15']}}}

In most cases it is recommended to keep your data structure as flat as possible.

Use the following code:

outDict = { key: { key2: grp2.to_dict(orient='list')
    for key2, grp2 in grp.groupby(level=1) }
        for key, grp in df.set_index(['A', 'B']).groupby(level=0) }

The result, after some "pretty printing" reformatting is:

{
  'a1': {
    'b1': {
      'C': ['c1', 'c2', 'c3', 'c4'],
      'D': ['d1', 'd2', 'd3', 'd4'],
      'E': ['e1', 'e2', 'e3', 'e4']
    },
    'b2': {
      'C': ['c5', 'c6', 'c7'],
      'D': ['d5', 'd6', 'd7'],
      'E': ['e5', 'e6', 'e7']}},
  'a2': {
    'b3': {
      'C': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'],
      'D': ['d8', 'd9', 'd10', 'd11', 'd12', 'd13'],
      'E': ['e8', 'e9', 'e10', 'e11', 'e12', 'e13']
    },
    'b4': {
      'C': ['c14', 'c15'],
      'D': ['d14', 'd15'],
      'E': ['e14', 'e15']
    }
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM