简体   繁体   中英

Json_normalise from nested API to DF throwing type error

I'm trying to consume an api that has the following structure and load it into a pandas DF with one row per item ID eg 2, 6 and columns for the high, hightime, low and lowtime for each entry.

{
"data": {
    "2": {
        "high": 142,
        "highTime": 1617214328,
        "low": 140,
        "lowTime": 1617214323
    },
    "6": {
        "high": 182198,
        "highTime": 1617214063,
        "low": 182198,
        "lowTime": 1617214137

So far I've been using json_normalise on the json response which loads one row with multiple nested columns for each entry:

data.2.high | data.2.highTime | data.2.low | data.2.lowTime etc

as I result, I tried adding the record_path for 'data' thinking that would address the fact that it's a nested list but doing so throws

 raise TypeError(
257                     f"{js} has non list value {result} for path {spec}. "
258                     "Must be list or null."

I think that's because my res['data'] type is a dict, not a list in it's own right but I'm slightly confused how to go about solving that or if that's even right.

Any help would be greatly appreciated

TL;DR

Just use

df = pd.DataFrame.from_records(json_data['data']).T.reset_index()

Explanation

In your scenario, pandas from_records works better than json_normalise . This the case because your response is structured in a way that the ids are the keys and not the values.

For instance, for this response example where there is a key id and its correspondent value

json_data={
"data": [{
        "id":2,
        "high": 142,
        "highTime": 1617214328,
        "low": 140,
        "lowTime": 1617214323
    },{
        "id":6,
        "high": 182198,
        "highTime": 1617214063,
        "low": 182198,
        "lowTime": 1617214137
    }]}

would work fine with json_normalize , as follows.

pd.json_normalize(json_data['data'])
id  high    highTime    low     lowTime
2   142     1617214328  140     1617214323
6   182198  1617214063  182198  1617214137

However, your JSON response contains ids as keys,

json_data={
"data": {
    "2": {
        "high": 142,
        "highTime": 1617214328,
        "low": 140,
        "lowTime": 1617214323
    },
    "6": {
        "high": 182198,
        "highTime": 1617214063,
        "low": 182198,
        "lowTime": 1617214137
    }}}

and so from_records works better.

df = pd.DataFrame.from_records(json_data['data']).T.reset_index()
index   high    highTime    low     lowTime
2       142     1617214328  140     1617214323
6       182198  1617214063  182198  1617214137

Also, things were not working because you probably were passing the full json response json_data instead of selecting by the data key json_data['data'] .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM