Converting list of dictionaries into Dataframe in sane way

Question

The zendesk api returns fields as a list of dictionaries but each list is a single record. I'm wondering if there is a better way to turn it all into a dataframe. If it's a dictionary of dictionaries then json_normalize takes care of it with no problem.

Caveat: not all records will have the same field IDs

Sample data:

data = [{
  "ticket_id": 4,
  "customer_id": 8,
  "created_at": "2022-05-01",
  "custom_fields": [
    {
      "id": 15,
      "value": "website"
    },
    {
      "id": 16,
      "value": "broken"
    },
    {
      "id": 23,
      "value": None
    },
  ],
  'group_id': 42
}]

Running any form of Dataframe, from_records , from_json , or json_normalize gives most of what I want but with the list in a single column:

t_df = pd.json_normalize(data)
t_df

Output:

	ticket_id	customer_id	created_at	custom_fields	group_id
0	4	8	2022-05-01	[{'id': 15, 'value': 'website'}, {'id': 16, 'v...	42

My current, probably ill-advised, solution is:

t_df = pd.DataFrame(sample_df.at[0, 'custom_fields']).T.reset_index(drop=True)
t_df.rename(columns=t_df.iloc[0], inplace=True)
t_df.drop(0, inplace=True)
t_df.reset_index(drop=True, inplace=True)
pd.merge(left=sample_df, left_index=True,
         right=t_df, right_index=True).drop(columns='custom_fields')

This results in a correct record that I could append to a main dataframe:

	ticket_id	customer_id	created_at	group_id	15	16	23
0	4	8	2022-05-01	42	website	broken	None

My worry is that i need to do this to ~25,000 records and this seems like it will be both slow and brittle (prone to breaking).

Answer 1

You should wrangle the data/dictionary first and only then construct a DataFrame with it. It will make your life easier and is faster than trying to manipulate the data with pandas ie after the DataFrame is created.

import pandas as pd

data = [{
  "ticket_id": 4,
  "customer_id": 8,
  "created_at": "2022-05-01",
  "custom_fields": [
    {
      "id": 15,
      "value": "website"
    },
    {
      "id": 16,
      "value": "broken"
    },
    {
      "id": 23,
      "value": None
    },
  ],
  'group_id': 42
}]

custom_fields = data[0].pop('custom_fields')
data[0].update({rec['id']: rec['value'] for rec in custom_fields})

t_df = pd.DataFrame(data)

Output:

>>> t_df 

   ticket_id  customer_id  created_at  group_id       15      16    23
0          4            8  2022-05-01        42  website  broken  None

Answer 2

It looks like pandas isn't automatically determining which fields are "metadata" and which are "records" --> if your data is fixed, I would recommend hardcoding the following:

>>> t_df = pd.json_normalize(
...     data,
...     meta=["ticket_id", "customer_id", "created_at", "group_id"],
...     record_path=["custom_fields"]
... )

   id    value ticket_id customer_id  created_at group_id
0  15  website         4           8  2022-05-01       42
1  16   broken         4           8  2022-05-01       42
2  23     None         4           8  2022-05-01       42

Documentation: https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html

Answer 3

import pandas as pd
data = [{
  "ticket_id": 4,
  "customer_id": 8,
  "created_at": "2022-05-01",
  "custom_fields": [
    {
      "id": 15,
      "value": "website"
    },
    {
      "id": 16,
      "value": "broken"
    },
    {
      "id": 23,
      "value": None
    },
  ],
  'group_id': 42
}]
df = pd.DataFrame(data)
for index in df.index:
    for i in df.loc[index,'custom_fields']:
        df.loc[index,i['id']] = i['value']
df.drop(columns = 'custom_fields',inplace = True)
df

Converting list of dictionaries into Dataframe in sane way

Question

3 answers

solution1
3 ACCPTED 2022-06-16 19:03:00

solution2
2 2022-06-17 23:24:00

solution3
0 2022-06-16 19:21:47

Converting list of dictionaries into Dataframe in sane way

Question

3 answers

solution1 3 ACCPTED 2022-06-16 19:03:00

solution2 2 2022-06-17 23:24:00

solution3 0 2022-06-16 19:21:47

solution1
3 ACCPTED 2022-06-16 19:03:00

solution2
2 2022-06-17 23:24:00

solution3
0 2022-06-16 19:21:47