简体   繁体   中英

how to convert json data with list of list to dataframe using python pandas

JSON Object
Actually, what I need is JSON to a data frame. the data is a list of the list where every list contains two arguments year and other value.

{
      "series_id": "TOTAL.PAEWPUS.A",
      "name": "Wells Drilled, Exploratory, Crude Oil, Annual",
      "units": "Number of Wells",
      "f": "A",
      "start": "2004",
      "end": "2012",
      "last_updated": "2016-04-25T13:57:43-04:00",
      "data": [
        [
          "2010",
          669
        ],
        [
          "2009",
          605
        ],
        [
          "2008",
          897
        ],
        [
          "2007",
          808
        ],
        [
          "2006",
          646
        ],
        [
          "2005",
          539
        ],
        [
          "2004",
          383
        ],
      ]
    }

how can create a data frame like this

series_id    start  end   data
TOTAL.PAEWPUS.A 2004 2010  2010
TOTAL.PAEWPUS.A 2004 2010  2009
TOTAL.PAEWPUS.A 2004 2010  2008
TOTAL.PAEWPUS.A 2004 2010  2007
TOTAL.PAEWPUS.A 2004 2010  2006
TOTAL.PAEWPUS.A 2004 2010  2005
TOTAL.PAEWPUS.A 2004 2010  2004

How can I achieve this?

Your JSON can be easly converted into a dictionary in Python:

d = { "series_id": "TOTAL.PAEWPUS.A",
      "name": "Wells Drilled, Exploratory, Crude Oil, Annual",
      "units": "Number of Wells",
      "f": "A",
      "start": "2004",
      "end": "2012",
      "last_updated": "2016-04-25T13:57:43-04:00",
      "data": [
               ["2010",669],
               ["2009",605],
               ["2008",897],
               ["2007",808],
               ["2006",646],
               ["2005",539],
               ["2004",383],
              ]
     }

After that you can get only the keys what you want:

d = {k : d[k] for k in ['series_id', 'start', 'end', 'data']}

Since you what only the first column of your date key, you filter this key:

d['data'] = [val[0] for val in d['data']]

The result is this:

print(d)

{'series_id': 'TOTAL.PAEWPUS.A',
 'start': '2004',
 'end': '2012',
 'data': ['2010', '2009', '2008', '2007', '2006', '2005', '2004']}

If you want to set end and start according to your data , just:

d['end'] = str(max(map(int,d['data'])))
d['start'] = str(min(map(int,d['data'])))

Then, you just put into a dataframe:

import pandas as pd

df = pd.DataFrame(d)

print(df)

You get:

   data   end        series_id start
0  2010  2012  TOTAL.PAEWPUS.A  2004
1  2009  2012  TOTAL.PAEWPUS.A  2004
2  2008  2012  TOTAL.PAEWPUS.A  2004
3  2007  2012  TOTAL.PAEWPUS.A  2004
4  2006  2012  TOTAL.PAEWPUS.A  2004
5  2005  2012  TOTAL.PAEWPUS.A  2004
6  2004  2012  TOTAL.PAEWPUS.A  2004

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM