简体   繁体   English

如何将 json 转换为 pandas dataframe?

[英]How to convert json into a pandas dataframe?

I'm trying to covert an api response from json to a dataframe in pandas.我正在尝试将 api 响应从 json 转换为 Z3A43B4F88325D9405AZC2C2C2中的 dataframe the problem I am having is that de data is nested in the json format and I am not getting the right columns in my dataframe.我遇到的问题是 de 数据嵌套在 json 格式中,我在 dataframe 中没有得到正确的列。

The data is collect from a api with the following format:数据是从 api 收集的,格式如下:

{'tickets': [{'url': 'https...',
   'id': 1,
   'external_id': None,
   'via': {'channel': 'web',
    'source': {'from': {}, 'to': {}, 'rel': None}},
   'created_at': '2020-05-01T04:16:33Z',
   'updated_at': '2020-05-23T03:02:49Z',
   'type': 'incident',
   'subject': 'Subject',
   'raw_subject': 'Raw subject',
   'description': 'Hi, this is the description',
   'priority': 'normal',
   'status': 'closed',
   'recipient': None,
   'requester_id': 409467360874,
   'submitter_id': 409126461453,
   'assignee_id': 409126461453,
   'organization_id': None,
   'group_id': 360009916453,
   'collaborator_ids': [],
   'follower_ids': [],
   'email_cc_ids': [],
   'forum_topic_id': None,
   'problem_id': None,
   'has_incidents': False,
   'is_public': True,
   'due_at': None,
   'tags': ['tag_1',
    'tag_2',
    'tag_3',
    'tag_4'],
   'custom_fields': [{'id': 360042034433, 'value': 'value of the first custom field'},
    {'id': 360041487874, 'value': 'value of the second custom field'},
    {'id': 360041489414, 'value': 'value of the third custom field'},
    {'id': 360040980053, 'value': 'correo_electrónico'},
    {'id': 360040980373, 'value': 'suscribe_newsletter'},
    {'id': 360042046173, 'value': None},
    {'id': 360041028574, 'value': 'product'},
    {'id': 360042103034, 'value': None}],
   'satisfaction_rating': {'score': 'unoffered'},
   'sharing_agreement_ids': [],
   'comment_count': 2,
   'fields': [{'id': 360042034433, 'value': 'value of the first custom field'},
    {'id': 360041487874, 'value': 'value of the second custom field'},
    {'id': 360041489414, 'value': 'value of the third custom field'},
    {'id': 360040980053, 'value': 'correo_electrónico'},
    {'id': 360040980373, 'value': 'suscribe_newsletter'},
    {'id': 360042046173, 'value': None},
    {'id': 360041028574, 'value': 'product'},
    {'id': 360042103034, 'value': None}],
   'followup_ids': [],
   'ticket_form_id': 360003608013,
   'deleted_ticket_form_id': 360003608013,
   'brand_id': 360004571673,
   'satisfaction_probability': None,
   'allow_channelback': False,
   'allow_attachments': True},

What I already tried is the following: I have converted the JSON format into a dict as following:我已经尝试过以下内容:我已将 JSON 格式转换为字典,如下所示:

x = response.json()
df = pd.DataFrame(x['tickets'])

But I'm struggling with the output.但我正在为 output 苦苦挣扎。 I don't know how to get a correct, ordered, normalized dataframe.我不知道如何获得正确、有序、标准化的 dataframe。

(I'm new in this:) ) (我是新来的:))

Let's supose you get your request data by this code r = requests.get(url, auth)假设您通过此代码获取请求数据r = requests.get(url, auth)

Your data ins't clear yet, so let's get a dataframe of it data = pd.read_json(json.dumps(r.json, ensure_ascii = False))你的数据还不清楚,所以让我们得到一个 dataframe data = pd.read_json(json.dumps(r.json, ensure_ascii = False))

But, probably you will get a dataframe with one single row.但是,您可能会得到一个单排的 dataframe。

When I faced a problem like this, I wrote this function to get the full data:当我遇到这样的问题时,我写了这个 function 来获取完整的数据:

listParam = []

def listDict(entry):
    if type(entry) is dict:
        listParam.append(entry)
    elif type(entry) is list:
        for ent in entry:
            listDict(ent)

Because your data looks like a dict because of {'tickets': ...} you will need to get the information like that:因为 {'tickets': ...} 你的数据看起来像一个字典,你需要得到这样的信息:

listDict(data.iloc[0][0])

And then,接着,

pd.DataFrame(listParam)

I can't show the results because you didn't post the complete data nor told where I can find the data to test, but this will probably work.我无法显示结果,因为您没有发布完整的数据,也没有告诉我在哪里可以找到要测试的数据,但这可能会奏效。

You have to convert the json to dictionary first and then convert the dictionary value for key 'tickets' into dataframe.您必须先将 json 转换为字典,然后将键 'tickets' 的字典值转换为 dataframe。

file = open('file.json').read()
ticketDictionary = json.loads(file)
df = pd.DataFrame(ticketDictionary['tickets'])

'file.json' contains your data here. 'file.json'在此处包含您的数据。

df now contains your dataFrame in this format. df现在包含这种格式的 dataFrame。 df

For the lists within the response you can have separate dataframes if required:对于响应中的列表,如果需要,您可以使用单独的数据框:

for field in df['fields']:
        df = pd.DataFrame(field)

It will give you this for lengths:它会给你这个长度:

            id                             value
0  360042034433   value of the first custom field
1  360041487874  value of the second custom field
2  360041489414   value of the third custom field
3  360040980053                correo_electrónico
4  360040980373               suscribe_newsletter
5  360042046173                              None
6  360041028574                           product
7  360042103034                              None

This can be one way to structure as you haven't mentioned the exact expected format.这可能是一种结构方式,因为您没有提到确切的预期格式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM