简体   繁体   中英

Extracting from a nested json file to a pandas dataframe

I have been trying to format a nested json file to a pandas dataframe but i may have missing something,

How can extract the timeseries onto a pandas dataframe? I have been struggling trying to extract all the numbering but if succesful I ended with some of metadata in a dataaframe

Please help!

{
"Meta Data": {
    "1. Information": "Intraday (60min) prices and volumes",
    "2. Symbol": "BHP.AX",
    "3. Last Refreshed": "2018-02-09 00:00:00",
    "4. Interval": "60min",
    "5. Output Size": "Compact",
    "6. Time Zone": "US/Eastern"
},
"Time Series (60min)": {
    "2018-02-09 00:00:00": {
        "1. open": "29.1100",
        "2. high": "29.1950",
        "3. low": "29.1000",
        "4. close": "29.1300",
        "5. volume": "788213"
    },
    "2018-02-08 23:00:00": {
        "1. open": "29.0000",
        "2. high": "29.2000",
        "3. low": "29.0000",
        "4. close": "29.1100",
        "5. volume": "768704"
    },
    "2018-02-08 22:00:00": {
        "1. open": "29.1000",
        "2. high": "29.1000",
        "3. low": "28.9600",
        "4. close": "29.0000",
        "5. volume": "830235"
    },
    "2018-02-08 21:00:00": {
        "1. open": "29.0850",
        "2. high": "29.2250",
        "3. low": "29.0750",
        "4. close": "29.1050",
        "5. volume": "803142"
    },
    "2018-02-08 20:00:00": {
        "1. open": "28.9200",
        "2. high": "29.1500",
        "3. low": "28.8900",
        "4. close": "29.0900",
        "5. volume": "1231131"
    }
  }
}

any Ideas ?

You can use pd.DataFrame.from_dict , specifying an orient .

data = json.loads(json_data)
df = pd.DataFrame.from_dict(data['Time Series (60min)'], orient='index')

You can also cleanup your column headers if needed.

df.columns = df.columns.str.split('. ').str[1] # an optional step

df

                    open     high      low    close   volume
2018-02-08 20:00:00  28.9200  29.1500  28.8900  29.0900  1231131
2018-02-08 21:00:00  29.0850  29.2250  29.0750  29.1050   803142
2018-02-08 22:00:00  29.1000  29.1000  28.9600  29.0000   830235
2018-02-08 23:00:00  29.0000  29.2000  29.0000  29.1100   768704
2018-02-09 00:00:00  29.1100  29.1950  29.1000  29.1300   788213

If you need only the time series, just tell Pandas!

js = '''Your_JSON_String'''
jsdata = json.loads(js)
jsts = jsdata["Time Series (60min)"]
ts = pd.DataFrame(jsts).T
#                     1. open  2. high   3. low 4. close 5. volume
# 2018-02-08 20:00:00  28.9200  29.1500  28.8900  29.0900   1231131
# 2018-02-08 21:00:00  29.0850  29.2250  29.0750  29.1050    803142
# 2018-02-08 22:00:00  29.1000  29.1000  28.9600  29.0000    830235
# 2018-02-08 23:00:00  29.0000  29.2000  29.0000  29.1100    768704
# 2018-02-09 00:00:00  29.1100  29.1950  29.1000  29.1300    788213

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM