This is my annoying nested dictionary:
"data": [
{
'type': 'a',
'id': '3',
'attributes': {'name': 'Alexander',
'address': 'Ree 25',
'postalCode': '3019 VM',
'place': 'Amsterdam',
'company': 'Pizza BV',
'phoneNumbers': [{'description': 'general', 'phoneNumber': '+31104136911'}],
'locationCode': 'DURTM',
'website': 'http://www.pizzabv.nl',
'primaryEmail': 'info@pizzabv.nl',
'secondaryEmail': '',
'geoLocation': {'type': 'Point',
'coordinates': [16.309702884655767, 31.879930329139634]
}
},
'relationships': [],
'links': {'self': 'www.homepage.nl'
}
},
{
'type': 'b',
'id': '7',
'attributes': {'name': 'Sam',
'address': 'Zee 15',
'postalCode': '2019 AM',
'place': 'Groningen',
'company': 'Salami BV',
'phoneNumbers': [{'description': 'specific', 'phoneNumber': '+31404136121'}],
'locationCode': 'SWSTM',
'website': 'http://www.salamibv.nl',
'primaryEmail': 'info@salamibv.nl',
'secondaryEmail': '',
'geoLocation': {'type': 'Point',
'coordinates': [18.309702884655767, 34.879930329139634]
}
},
'relationships': [],
'links': {'self': 'www.homepage.nl'
}
}
]
This is how I would like to have my dataframe:
type | id | name | address | postalCode | ... | type | coordinates | relationships | links
... ... ... ... ... ... ... ... ... ...
So different underlying dictionaries have to be moved up a layer. First attributes has to be deleted and underlying values has to be moved up one layer.
Also description and phoneNumber must be moved up a layer and then phoneNumbers can me removed.
Furthermore all information about a type and id should be placed in one row.
I don't get how to do this. I tried several methods like these:
terminals = pd.DataFrame.from_dict(data, orient='columns')
terminals.reset_index(level=0, inplace=True)
terminals.head()
But this gives me complete dictionaries in one cell of a Pandas Dataframe.
I hope somebody can help me out a bit.
define a function that parses the dictionary into the flattened structure then apply that before passing it to the dataframe constructor
def parse(dict_)
di = dict_.copy() # weak copy the dictionary so you don't modify the original dicts
# bring attributes up a level
di.update(di['attributes'])
del di['attributes']
# etc...
return di
df = pd.DataFrame(map(parse, data))
You have to denest your data. You can do it with a recursive function such as:
def denest(x, parent=None, d=None):
if d is None:
d = {}
for k, v in x.items():
if isinstance(v, dict):
denest(v, parent=(parent or []) + [k], d=d)
elif isinstance(v, (list, tuple)):
for j, item in enumerate(v):
if isinstance(item, dict):
denest(item, parent=(parent or []) + [k, j], d=d)
else:
d[tuple((parent or []) + [k, j])] = item
else:
d[tuple((parent or []) + [k])] = v
return d
Then, assuming data
is a list of dictionaries, you can simply create a dataframe like this:
pd.DataFrame([denest(i) for i in data])
(attributes, address) (attributes, company) (attributes, geoLocation, coordinates, 0) (attributes, geoLocation, coordinates, 1) (attributes, geoLocation, type) (attributes, locationCode) (attributes, name) (attributes, phoneNumbers, 0, description) (attributes, phoneNumbers, 0, phoneNumber) (attributes, place) (attributes, postalCode) (attributes, primaryEmail) (attributes, secondaryEmail) (attributes, website) (id,) (links, self) (type,)
0 Ree 25 Pizza BV 16.309703 31.87993 Point DURTM Alexander general +31104136911 Amsterdam 3019 VM info@pizzabv.nl http://www.pizzabv.nl 3 www.homepage.nl a
1 Zee 15 Salami BV 18.309703 34.87993 Point SWSTM Sam specific +31404136121 Groningen 2019 AM info@salamibv.nl http://www.salamibv.nl 7 www.homepage.nl b
If you prefer, from here you can then rename columns and/or turn them into a multi-index dataframe, etc.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.