I have a collection in my MongoDB database, where each record represents an edge(a road in the application I am building). Each record has the following form where the first id
is the id
of the edge:
{ "_id":{ "$oid":"5d0e7acc9c0bd9917006dd56" }, "edge":{ "@id":":3659704519_0", "@traveltime":"2.37", "@timestep":"3", "lane":[ { "@id":":3330548807_1_0", "@maxspeed":"1", "@meanspeed":"79.99", "@occupancy":"0.00", "@shape":"11.735290362905872,48.16774527062213,11.735369706697464,48.16778792148228" }, { "@id":":3330548807_1_1", "@maxspeed":"1", "@meanspeed":"79.99", "@occupancy":"0.00", "@shape":"11.73526233983474,48.16776717333565,11.735343756121146,48.16781085462666" } ] } }
I want to do some analysis with those data and I want to convert the records to a data frame in pandas. The desired data frame skeleton would be like this:
the desirable skeleton for the data frame
I have tried normalizing with pandas.io.json.json_normalize(d)
but I cannot get the output I want.
As we can see I have an array of lanes that can have a maximum of two lanes. It can also contain only one lane. So, I want to separate the lanes into two rows of the data frame.
Could someone please suggest a solution to me ?
if your data is nested like yours you have to transform it to a flat shape before you can create a data frame.
import pandas
json = [
{
"_id":{
"$oid":"5d0e7acc9c0bd9917006dd56"
},
"edge":{
"@id":":3659704519_0",
"@traveltime":"2.37",
"@timestep":"3",
"lane": [
{
"@id":":3330548807_1_0",
"@maxspeed":"1",
"@meanspeed":"79.99",
"@occupancy":"0.00",
"@shape":"11.735290362905872,48.16774527062213,11.735369706697464,48.16778792148228"
},
{
"@id":":3330548807_1_1",
"@maxspeed":"1",
"@meanspeed":"79.99",
"@occupancy":"0.00",
"@shape":"11.73526233983474,48.16776717333565,11.735343756121146,48.16781085462666"
}
]
}
},
{
"_id":{
"$oid":"5d0e7acc9c0bd9917006dd56"
},
"edge":{
"@id":":3659704519_0",
"@traveltime":"2.37",
"@timestep":"3",
"lane":{
"@id":":3330548807_1_0",
"@maxspeed":"1",
"@meanspeed":"79.99",
"@occupancy":"0.00",
"@shape":"11.735290362905872,48.16774527062213,11.735369706697464,48.16778792148228"
}
}
},
]
def ensure_list(obj):
if isinstance(obj, list):
return obj
else:
return [obj]
json_transformed = [
{
# edge attributes
'edge_id': record['edge']['@id'],
# lane attributes
'lane_id': lane['@id'],
# ...
}
for record in json
for lane in ensure_list(record['edge']['lane'])
]
df = pandas.DataFrame(json_transformed)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.