I have some data containing nested dictionaries like below:
mylist = [{"a": 1, "b": {"c": 2, "d":3}}, {"a": 3, "b": {"c": 4, "d":3}}]
If we convert it to pandas DataFrame,
import pandas as pd
result_dataframe = pd.DataFrame(mylist)
print(result_dataframe)
It will output:
a b
0 1 {'c': 2, 'd': 3}
1 3 {'c': 4, 'd': 3}
I want to convert the list of dictionaries and ignore the key of the nested dictionary. My code is below:
new_dataframe = result_dataframe.drop(columns=["b"])
b_dict_list = [document["b"] for document in mylist]
b_df = pd.DataFrame(b_dict_list)
frames = [new_dataframe, b_df]
total_frame = pd.concat(frames, axis=1)
The total_frame is which I want:
a c d
0 1 2 3
1 3 4 3
But I think my code is a little complicated. Is there any simple way to deal with this problem? Thank you.
Use dict comprehension
with pop
for extract value b
and merge dictionaries :
a = [{**x, **x.pop('b')} for x in mylist]
print (a)
[{'a': 1, 'c': 2, 'd': 3}, {'a': 3, 'c': 4, 'd': 3}]
result_dataframe = pd.DataFrame(a)
print(result_dataframe)
a c d
0 1 2 3
1 3 4 3
Another solution, thanks @Sandeep Kadapa :
a = [{'a': x['a'], **x['b']} for x in mylist]
#alternative
a = [{'a': x['a'], **x.get('b')} for x in mylist]
Or by applying pd.Series()
to your method:
mylist = [{"a": 1, "b": {"c": 2, "d":3}}, {"a": 3, "b": {"c": 4, "d":3}}]
result_dataframe = pd.DataFrame(mylist)
result_dataframe.drop('b',1).join(result_dataframe.b.apply(pd.Series))
a c d
0 1 2 3
1 3 4 3
I prefer to write a function that accepts your mylist
and converts it 1 nested layer down and returns a dictionary. This has the added advantage of not requiring you to 'manually' know what key like b
to convert. So this function works for all nested keys 1 layer down.
mylist = [{"a": 1, "b": {"c": 2, "d":3}}, {"a": 3, "b": {"c": 4, "d":3}}]
import pandas as pd
def dropnested(alist):
outputdict = {}
for dic in alist:
for key, value in dic.items():
if isinstance(value, dict):
for k2, v2, in value.items():
outputdict[k2] = outputdict.get(k2, []) + [v2]
else:
outputdict[key] = outputdict.get(key, []) + [value]
return outputdict
df = pd.DataFrame.from_dict(dropnested(mylist))
print (df)
# a c d
#0 1 2 3
#1 3 4 3
If you try:
mylist = [{"a": 1, "b": {"c": 2, "d":3}, "g": {"e": 2, "f":3}},
{"a": 3, "z": {"c": 4, "d":3}, "e": {"e": 2, "f":3}}]
df = pd.DataFrame.from_dict(dropnested(mylist))
print (df)
# a c d e f
#0 1 2 3 2 3
#1 3 4 3 2 3
We can see here that it converts keys b
, g
, z
, e
without issue, as opposed to having to define each and every nested key name to convert
I had a similar problem to this one. I used pd.json_normalize(x)
and it worked. The only difference is that the column names of the data frame will look a little different.
a | bc | bd | |
---|---|---|---|
0 | 1 | 2 | 3 |
1 | 3 | 4 | 3 |
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.