Nested Json Array not handled by pandas dataframe / pd.json_normalize

Question

Any help is really appreciated,

I have the below JSON, provided by API call. I've omitted sensitive data, but the key names are as presented ("value", "value_raw").

[{
    "Position": "1234",
    "StartDate": "2020-11-21",
    "ID": "1234",
    "CloseDate": "2020-12-07",
    "Title": "This is a title",
    "data": [{
            "value": 1234
        },
        {
            "value": "some text"
        },
        {
            "value": "some text",
            "value_raw": 11111
        },
        {
            "value_raw": 11111,
            "value": "some text"
        },
        {
            "value": "null"
        },
        {
            "value": "some text",
            "value_raw": 22222
        },
        {
            "value_raw": 2222222,
            "value": "some text"
        },
        {
            "value_raw": "null",
            "value": "null"
        },
        {
            "value_raw": "null",
            "value": "null"
        },
        {
            "value_raw": 2222222,
            "value": "some text"
        },
        {
            "value": "null"
        },
        {
            "value": "some text",
            "value_raw": 2222
        },
        {
            "value": 1
        },
        {
            "value": "some text",
            "value_raw": 22222


        }
    ]
}, {
    "Position": "1235",
    "StartDate": "2020-12-21",
    "ID": "1235",
    "CloseDate": "2021-01-12",
    "Title": "some text",
    "data": [{
            "value": 1235
        },
        {
            "value": "some text"
        },
        {
            "value": "some text",
            "value_raw": 1111
        },
        {
            "value": "some text",
            "value_raw": 1111
        },
        {
            "value": "null"
        },
        {
            "value_raw": 1111,
            "value": "some text"
        },
        {
            "value_raw": 11111,
            "value": "some text"
        },
        {
            "value_raw": "null",
            "value": "null"
        },
        {
            "value": "some text",
            "value_raw": 1111
        },
        {
            "value_raw": "null",
            "value": "null"
        },
        {
            "value": "null"
        },
        {
            "value": "some text",
            "value_raw": 22222
        },
        {
            "value": 1
        },
        {
            "value_raw": 22222,
            "value": "some text"


        }
    ]
}, {
    "ID": "1236",
    "Position": "1236",
    "StartDate": "2021-07-12",
    "data": [{
            "value": 1236
        },
        {
            "value": "some text"
        },
        {
            "value_raw": 1111,
            "value": "some text"
        },
        {
            "value": "some text",
            "value_raw": 1111
        },
        {
            "value": "null"
        },
        {
            "value_raw": 1111,
            "value": "some text"
        },
        {
            "value_raw": 1111,
            "value": "some text"
        },
        {
            "value_raw": "null",
            "value": "null"
        },
        {
            "value": "null",
            "value_raw": "null"
        },
        {
            "value": "some text",
            "value_raw": 111
        },
        {
            "value": "null"
        },
        {
            "value": "some text",
            "value_raw": 12223
        },
        {
            "value": 1
        },
        {
            "value": "some text",
            "value_raw": 2222



        }
    ],
    "Title": "some text",
    "CloseDate": "2021-07-23"
}

]

When I normalize "data" using;

df = pd.json_normalize(mydata, record_path=['data'])

I end up with an output of 2 columns x 42 rows (excl. headings), illustration:

value	value_raw
1234
This is a title
some text	11111
Corporation	11111
null
some text	22222
some text	2222222
null	null
null	null

The only data im interested in is the key "value", I'd also like to know how to lay this data out as 3 rows x 14 columns (one row for each ID = '1234', '1235' & '1236' and no column headings needed as they provide zero benefit with the naming convention "value")

Any starting point would be great, I have spent hours looking at previous questions. What I have noticed is that the JSON I receive is very different to all of the examples out there.

Thanks everyone

Answer 1

Nice question. I think it's not possible with json_normalize. Therefore did it with a loop and list comprehension:

values_list_all_rows = []
for json_element in json_list:
    values_list_per_row = [value_dict["value"] for value_dict in json_element["data"] if "value" in value_dict]
    values_list_all_rows.append(values_list_per_row)
pd.DataFrame(values_list_all_rows)

Gives (it will set None as value where rows has less values than other column):


0   1   2   3   4   5   6   7   8   9   10  11  12  13
1234    some text   some text   some text   null    some text   some text   null    null    some text   null    some text   1   some text
1235    some text   some text   some text   null    some text   some text   null    some text   null    null    some text   1   some text
1236    some text   some text   some text   null    some text   some text   null    null    some text   null    some text   1   None

Nested Json Array not handled by pandas dataframe / pd.json_normalize

Question

1 answers

solution1
1 2022-08-18 18:46:31

Nested Json Array not handled by pandas dataframe / pd.json_normalize

Question

1 answers

solution1 1 2022-08-18 18:46:31

solution1
1 2022-08-18 18:46:31