简体   繁体   中英

From MongoDB to a Pandas data frame

I have a collection in my MongoDB database, where each record represents an edge(a road in the application I am building). Each record has the following form where the first id is the id of the edge:

 { "_id":{ "$oid":"5d0e7acc9c0bd9917006dd56" }, "edge":{ "@id":":3659704519_0", "@traveltime":"2.37", "@timestep":"3", "lane":[ { "@id":":3330548807_1_0", "@maxspeed":"1", "@meanspeed":"79.99", "@occupancy":"0.00", "@shape":"11.735290362905872,48.16774527062213,11.735369706697464,48.16778792148228" }, { "@id":":3330548807_1_1", "@maxspeed":"1", "@meanspeed":"79.99", "@occupancy":"0.00", "@shape":"11.73526233983474,48.16776717333565,11.735343756121146,48.16781085462666" } ] } } 

I want to do some analysis with those data and I want to convert the records to a data frame in pandas. The desired data frame skeleton would be like this:

the desirable skeleton for the data frame

I have tried normalizing with pandas.io.json.json_normalize(d) but I cannot get the output I want.

As we can see I have an array of lanes that can have a maximum of two lanes. It can also contain only one lane. So, I want to separate the lanes into two rows of the data frame.

Could someone please suggest a solution to me ?

if your data is nested like yours you have to transform it to a flat shape before you can create a data frame.

import pandas

json = [
{
   "_id":{
      "$oid":"5d0e7acc9c0bd9917006dd56"
   },
   "edge":{
      "@id":":3659704519_0",
      "@traveltime":"2.37",
      "@timestep":"3",
      "lane": [
         {
            "@id":":3330548807_1_0",
            "@maxspeed":"1",
            "@meanspeed":"79.99",
            "@occupancy":"0.00",
            "@shape":"11.735290362905872,48.16774527062213,11.735369706697464,48.16778792148228"
         },
         {
            "@id":":3330548807_1_1",
            "@maxspeed":"1",
            "@meanspeed":"79.99",
            "@occupancy":"0.00",
            "@shape":"11.73526233983474,48.16776717333565,11.735343756121146,48.16781085462666"
         }
      ]
   }
},
{
   "_id":{
      "$oid":"5d0e7acc9c0bd9917006dd56"
   },
   "edge":{
      "@id":":3659704519_0",
      "@traveltime":"2.37",
      "@timestep":"3",
      "lane":{
            "@id":":3330548807_1_0",
            "@maxspeed":"1",
            "@meanspeed":"79.99",
            "@occupancy":"0.00",
            "@shape":"11.735290362905872,48.16774527062213,11.735369706697464,48.16778792148228"
      }
   }
},
]

def ensure_list(obj):
    if isinstance(obj, list):
        return obj
    else:
        return [obj]

json_transformed = [
    {
        # edge attributes
        'edge_id': record['edge']['@id'],
        # lane attributes
        'lane_id': lane['@id'],
        # ...
    }
    for record in json
    for lane in ensure_list(record['edge']['lane'])
]

df = pandas.DataFrame(json_transformed)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM