简体   繁体   中英

How to convert json into a pandas dataframe?

I'm trying to covert an api response from json to a dataframe in pandas. the problem I am having is that de data is nested in the json format and I am not getting the right columns in my dataframe.

The data is collect from a api with the following format:

{'tickets': [{'url': 'https...',
   'id': 1,
   'external_id': None,
   'via': {'channel': 'web',
    'source': {'from': {}, 'to': {}, 'rel': None}},
   'created_at': '2020-05-01T04:16:33Z',
   'updated_at': '2020-05-23T03:02:49Z',
   'type': 'incident',
   'subject': 'Subject',
   'raw_subject': 'Raw subject',
   'description': 'Hi, this is the description',
   'priority': 'normal',
   'status': 'closed',
   'recipient': None,
   'requester_id': 409467360874,
   'submitter_id': 409126461453,
   'assignee_id': 409126461453,
   'organization_id': None,
   'group_id': 360009916453,
   'collaborator_ids': [],
   'follower_ids': [],
   'email_cc_ids': [],
   'forum_topic_id': None,
   'problem_id': None,
   'has_incidents': False,
   'is_public': True,
   'due_at': None,
   'tags': ['tag_1',
    'tag_2',
    'tag_3',
    'tag_4'],
   'custom_fields': [{'id': 360042034433, 'value': 'value of the first custom field'},
    {'id': 360041487874, 'value': 'value of the second custom field'},
    {'id': 360041489414, 'value': 'value of the third custom field'},
    {'id': 360040980053, 'value': 'correo_electrónico'},
    {'id': 360040980373, 'value': 'suscribe_newsletter'},
    {'id': 360042046173, 'value': None},
    {'id': 360041028574, 'value': 'product'},
    {'id': 360042103034, 'value': None}],
   'satisfaction_rating': {'score': 'unoffered'},
   'sharing_agreement_ids': [],
   'comment_count': 2,
   'fields': [{'id': 360042034433, 'value': 'value of the first custom field'},
    {'id': 360041487874, 'value': 'value of the second custom field'},
    {'id': 360041489414, 'value': 'value of the third custom field'},
    {'id': 360040980053, 'value': 'correo_electrónico'},
    {'id': 360040980373, 'value': 'suscribe_newsletter'},
    {'id': 360042046173, 'value': None},
    {'id': 360041028574, 'value': 'product'},
    {'id': 360042103034, 'value': None}],
   'followup_ids': [],
   'ticket_form_id': 360003608013,
   'deleted_ticket_form_id': 360003608013,
   'brand_id': 360004571673,
   'satisfaction_probability': None,
   'allow_channelback': False,
   'allow_attachments': True},

What I already tried is the following: I have converted the JSON format into a dict as following:

x = response.json()
df = pd.DataFrame(x['tickets'])

But I'm struggling with the output. I don't know how to get a correct, ordered, normalized dataframe.

(I'm new in this:) )

Let's supose you get your request data by this code r = requests.get(url, auth)

Your data ins't clear yet, so let's get a dataframe of it data = pd.read_json(json.dumps(r.json, ensure_ascii = False))

But, probably you will get a dataframe with one single row.

When I faced a problem like this, I wrote this function to get the full data:

listParam = []

def listDict(entry):
    if type(entry) is dict:
        listParam.append(entry)
    elif type(entry) is list:
        for ent in entry:
            listDict(ent)

Because your data looks like a dict because of {'tickets': ...} you will need to get the information like that:

listDict(data.iloc[0][0])

And then,

pd.DataFrame(listParam)

I can't show the results because you didn't post the complete data nor told where I can find the data to test, but this will probably work.

You have to convert the json to dictionary first and then convert the dictionary value for key 'tickets' into dataframe.

file = open('file.json').read()
ticketDictionary = json.loads(file)
df = pd.DataFrame(ticketDictionary['tickets'])

'file.json' contains your data here.

df now contains your dataFrame in this format. df

For the lists within the response you can have separate dataframes if required:

for field in df['fields']:
        df = pd.DataFrame(field)

It will give you this for lengths:

            id                             value
0  360042034433   value of the first custom field
1  360041487874  value of the second custom field
2  360041489414   value of the third custom field
3  360040980053                correo_electrónico
4  360040980373               suscribe_newsletter
5  360042046173                              None
6  360041028574                           product
7  360042103034                              None

This can be one way to structure as you haven't mentioned the exact expected format.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM