简体   繁体   中英

Nested Json Array not handled by pandas dataframe / pd.json_normalize

Any help is really appreciated,

I have the below JSON, provided by API call. I've omitted sensitive data, but the key names are as presented ("value", "value_raw").

[{
    "Position": "1234",
    "StartDate": "2020-11-21",
    "ID": "1234",
    "CloseDate": "2020-12-07",
    "Title": "This is a title",
    "data": [{
            "value": 1234
        },
        {
            "value": "some text"
        },
        {
            "value": "some text",
            "value_raw": 11111
        },
        {
            "value_raw": 11111,
            "value": "some text"
        },
        {
            "value": "null"
        },
        {
            "value": "some text",
            "value_raw": 22222
        },
        {
            "value_raw": 2222222,
            "value": "some text"
        },
        {
            "value_raw": "null",
            "value": "null"
        },
        {
            "value_raw": "null",
            "value": "null"
        },
        {
            "value_raw": 2222222,
            "value": "some text"
        },
        {
            "value": "null"
        },
        {
            "value": "some text",
            "value_raw": 2222
        },
        {
            "value": 1
        },
        {
            "value": "some text",
            "value_raw": 22222


        }
    ]
}, {
    "Position": "1235",
    "StartDate": "2020-12-21",
    "ID": "1235",
    "CloseDate": "2021-01-12",
    "Title": "some text",
    "data": [{
            "value": 1235
        },
        {
            "value": "some text"
        },
        {
            "value": "some text",
            "value_raw": 1111
        },
        {
            "value": "some text",
            "value_raw": 1111
        },
        {
            "value": "null"
        },
        {
            "value_raw": 1111,
            "value": "some text"
        },
        {
            "value_raw": 11111,
            "value": "some text"
        },
        {
            "value_raw": "null",
            "value": "null"
        },
        {
            "value": "some text",
            "value_raw": 1111
        },
        {
            "value_raw": "null",
            "value": "null"
        },
        {
            "value": "null"
        },
        {
            "value": "some text",
            "value_raw": 22222
        },
        {
            "value": 1
        },
        {
            "value_raw": 22222,
            "value": "some text"


        }
    ]
}, {
    "ID": "1236",
    "Position": "1236",
    "StartDate": "2021-07-12",
    "data": [{
            "value": 1236
        },
        {
            "value": "some text"
        },
        {
            "value_raw": 1111,
            "value": "some text"
        },
        {
            "value": "some text",
            "value_raw": 1111
        },
        {
            "value": "null"
        },
        {
            "value_raw": 1111,
            "value": "some text"
        },
        {
            "value_raw": 1111,
            "value": "some text"
        },
        {
            "value_raw": "null",
            "value": "null"
        },
        {
            "value": "null",
            "value_raw": "null"
        },
        {
            "value": "some text",
            "value_raw": 111
        },
        {
            "value": "null"
        },
        {
            "value": "some text",
            "value_raw": 12223
        },
        {
            "value": 1
        },
        {
            "value": "some text",
            "value_raw": 2222



        }
    ],
    "Title": "some text",
    "CloseDate": "2021-07-23"
}

]

When I normalize "data" using;

df = pd.json_normalize(mydata, record_path=['data'])

I end up with an output of 2 columns x 42 rows (excl. headings), illustration:

value value_raw
1234
This is a title
some text 11111
Corporation 11111
null
some text 22222
some text 2222222
null null
null null

The only data im interested in is the key "value", I'd also like to know how to lay this data out as 3 rows x 14 columns (one row for each ID = '1234', '1235' & '1236' and no column headings needed as they provide zero benefit with the naming convention "value")

Any starting point would be great, I have spent hours looking at previous questions. What I have noticed is that the JSON I receive is very different to all of the examples out there.

Thanks everyone

Nice question. I think it's not possible with json_normalize. Therefore did it with a loop and list comprehension:

values_list_all_rows = []
for json_element in json_list:
    values_list_per_row = [value_dict["value"] for value_dict in json_element["data"] if "value" in value_dict]
    values_list_all_rows.append(values_list_per_row)
pd.DataFrame(values_list_all_rows)

Gives (it will set None as value where rows has less values than other column):


0   1   2   3   4   5   6   7   8   9   10  11  12  13
1234    some text   some text   some text   null    some text   some text   null    null    some text   null    some text   1   some text
1235    some text   some text   some text   null    some text   some text   null    some text   null    null    some text   1   some text
1236    some text   some text   some text   null    some text   some text   null    null    some text   null    some text   1   None

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM