I have a requirement to convert a df that is in following format:
d = {
'A': ['a1', 'a1', 'a1', 'a1', 'a1', 'a1', 'a1', 'a2', 'a2', 'a2', 'a2', 'a2', 'a2', 'a2', 'a2'],
'B': ['b1', 'b1', 'b1', 'b1', 'b2', 'b2', 'b2', 'b3', 'b3', 'b3', 'b3', 'b3', 'b3', 'b4', 'b4', ],
'C': ['c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7', 'c8', 'c9', 'c10', 'c11', 'c12', 'c13', 'c14', 'c15', ],
'D': ['d1', 'd2', 'd3', 'd4', 'd5', 'd6', 'd7', 'd8', 'd9', 'd10', 'd11', 'd12', 'd13', 'd14', 'd15', ],
'E': ['e1', 'e2', 'e3', 'e4', 'e5', 'e6', 'e7', 'e8', 'e9', 'e10', 'e11', 'e12', 'e13', 'e14', 'e15', ],
}
df = pd.DataFrame(d)
df
A B C D E
a1 b1 c1 d1 e1
a1 b1 c2 d2 e2
a1 b1 c3 d3 e3
a1 b1 c4 d4 e4
a1 b2 c5 d5 e5
a1 b2 c6 d6 e6
a1 b2 c7 d7 e7
a2 b3 c8 d8 e8
a2 b3 c9 d9 e9
a2 b3 c10 d10 e10
a2 b3 c11 d11 e11
a2 b3 c12 d12 e12
a2 b3 c13 d13 e13
a2 b4 c14 d14 e14
a2 b4 c15 d15 e15
to a dictionary in following format:
outDict = {
'a1': {
'b1': {
'c': ['c1', 'c2', 'c3', 'c4'],
'd': ['d1', 'd2', 'd3', 'd4'],
'e': ['e1', 'e2', 'e3', 'e4'],
},
'b2': {
'c': ['c5', 'c6', 'c7'],
'd': ['d5', 'd6', 'd7'],
'e': ['e5', 'e6', 'e7'],
},
},
'a2': {
'b3': {
'c': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'],
'd': ['d8', 'd9', 'd10', 'd11', 'd12', 'd13'],
'e': ['e8', 'e9', 'e10', 'e11', 'e12', 'e13'],
},
'b4': {
'c': ['c14', 'c15'],
'd': ['d14', 'd15'],
'e': ['e14', 'e15'],
}
}
}
ie convert values in column A to first level keys; values in column B to second level keys and values in column C,D,E to lists.
First create nested lists by convert A, B
to index, group by index values and all columns convert to list
s in lambda function, last convert Series with MultiIndex
to nested dictionary:
df = (df.set_index(['A', 'B'])
.groupby(['A', 'B'])
.apply(lambda x: x.to_dict(orient='list')))
d = {level: df.xs(level).to_dict() for level in df.index.levels[0]}
print (d)
{
'a1': {
'b1': {
'C': ['c1', 'c2', 'c3', 'c4'],
'D': ['d1', 'd2', 'd3', 'd4'],
'E': ['e1', 'e2', 'e3', 'e4']
},
'b2': {
'C': ['c5', 'c6', 'c7'],
'D': ['d5', 'd6', 'd7'],
'E': ['e5', 'e6', 'e7']
}
},
'a2': {
'b3': {
'C': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'],
'D': ['d8', 'd9', 'd10', 'd11', 'd12', 'd13'],
'E': ['e8', 'e9', 'e10', 'e11', 'e12', 'e13']
},
'b4': {
'C': ['c14', 'c15'],
'D': ['d14', 'd15'],
'E': ['e14', 'e15']
}
}
}
If need nested keys in lowercases only rename columns:
df = df.rename(columns={'C':'c', 'D':'d', 'E':'e'})
df = (df.set_index(['A', 'B'])
.groupby(['A', 'B'])
.apply(lambda x: x.to_dict(orient='list')))
d = {level: df.xs(level).to_dict() for level in df.index.levels[0]}
print (d)
{
'a1': {
'b1': {
'c': ['c1', 'c2', 'c3', 'c4'],
'd': ['d1', 'd2', 'd3', 'd4'],
'e': ['e1', 'e2', 'e3', 'e4']
},
'b2': {
'c': ['c5', 'c6', 'c7'],
'd': ['d5', 'd6', 'd7'],
'e': ['e5', 'e6', 'e7']
}
},
'a2': {
'b3': {
'c': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'],
'd': ['d8', 'd9', 'd10', 'd11', 'd12', 'd13'],
'e': ['e8', 'e9', 'e10', 'e11', 'e12', 'e13']
},
'b4': {
'c': ['c14', 'c15'],
'd': ['d14', 'd15'],
'e': ['e14', 'e15']
}
}
}
If the purpose is just to access by keys in A and B, I actually think the output from a group by with tuple keys is sufficient.
d = df.groupby(['A','B']).agg({'C':list,'D':list,'E':list}).to_dict(orient = 'index')
d
>>
{('a1', 'b1'): {'C': ['c1', 'c2', 'c3', 'c4'],
'D': ['d1', 'd2', 'd3', 'd4'],
'E': ['e1', 'e2', 'e3', 'e4']},
('a1', 'b2'): {'C': ['c5', 'c6', 'c7'],
'D': ['d5', 'd6', 'd7'],
'E': ['e5', 'e6', 'e7']},
('a2', 'b3'): {'C': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'],
'D': ['d8', 'd9', 'd10', 'd11', 'd12', 'd13'],
'E': ['e8', 'e9', 'e10', 'e11', 'e12', 'e13']},
('a2', 'b4'): {'C': ['c14', 'c15'], 'D': ['d14', 'd15'], 'E': ['e14', 'e15']}}
I prefer this way because is is less nested, however, you can easily rebuild the dict to fit your requirements:
new_dic = {}
for k,v in d.items():
new_dic.setdefault(k[0],{})
new_dic[k[0]] .update({k[1]:{_k.lower():_v for _k,_v in v.items()}})
new_dict
>>
{'a1': {'b1': {'c': ['c1', 'c2', 'c3', 'c4'],
'd': ['d1', 'd2', 'd3', 'd4'],
'e': ['e1', 'e2', 'e3', 'e4']},
'b2': {'c': ['c5', 'c6', 'c7'],
'd': ['d5', 'd6', 'd7'],
'e': ['e5', 'e6', 'e7']}},
'a2': {'b3': {'c': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'],
'd': ['d8', 'd9', 'd10', 'd11', 'd12', 'd13'],
'e': ['e8', 'e9', 'e10', 'e11', 'e12', 'e13']},
'b4': {'c': ['c14', 'c15'], 'd': ['d14', 'd15'], 'e': ['e14', 'e15']}}}
In most cases it is recommended to keep your data structure as flat as possible.
Use the following code:
outDict = { key: { key2: grp2.to_dict(orient='list')
for key2, grp2 in grp.groupby(level=1) }
for key, grp in df.set_index(['A', 'B']).groupby(level=0) }
The result, after some "pretty printing" reformatting is:
{
'a1': {
'b1': {
'C': ['c1', 'c2', 'c3', 'c4'],
'D': ['d1', 'd2', 'd3', 'd4'],
'E': ['e1', 'e2', 'e3', 'e4']
},
'b2': {
'C': ['c5', 'c6', 'c7'],
'D': ['d5', 'd6', 'd7'],
'E': ['e5', 'e6', 'e7']}},
'a2': {
'b3': {
'C': ['c8', 'c9', 'c10', 'c11', 'c12', 'c13'],
'D': ['d8', 'd9', 'd10', 'd11', 'd12', 'd13'],
'E': ['e8', 'e9', 'e10', 'e11', 'e12', 'e13']
},
'b4': {
'C': ['c14', 'c15'],
'D': ['d14', 'd15'],
'E': ['e14', 'e15']
}
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.