简体   繁体   English

Pandas:如何使用 JSON 的多个嵌套列表规范化 JSON 文件?

[英]Pandas: How do I normalize a JSON file with multiple nested lists of JSON?

I'm requesting a data from a API and then trying to normalize this JSON file, it has this structure我正在请求来自 API 的数据,然后尝试规范化此 JSON 文件,它具有以下结构

[{'la_id': '33',
  'store': '1405fdsa6001209',
  'sell': '110aa346',
  'products': [{'codigo': '176690', 'lacre': '15980fd2293', 'valor': '49.90'},
   {'codigo': 'sd4907', 'lacre': '1598a12385', 'valor': '19.90'},
   {'codigo': 'aa4907', 'lacre': '1598a2384', 'valor': '19.90'},
   {'codigo': '1fd307', 'lacre': '1598a20401', 'valor': '169.90'}],
  'payment': {'paymentid': '10a836',
   'value': '259.6000',
   'number': '4',
   'finalid': '4',
   'finalname': 'Cartao de credito',
   'docs': '849763',
   'flag': None}}
   'pagamentos': [{'pagamento_id': '107795',
   'valor': '854.9900',
   'numero_parcelas': '10',
   'finalizador_id': '4',
   'finalizador_nome': 'Cartao de credito',
   'documento': '500003',
   'bandeira': 'MASTERCARD'}]

When I apply the JsonNormalize, in order to transform this into a dataframe, I'm getting this:当我应用 JsonNormalize 时,为了将其转换为 dataframe,我得到了这个:

id ID store店铺 sell products产品 pagamentos帕加门托斯
33 33 1405fdsa6001209 1405fdsa6001209 110aa346 110aa346 [{'codigo': '176690', 'lacre': '15980fd2293', 'valor': '49.90'}, {'codigo': 'sd4907', 'lacre': '1598a12385', 'valor': '19.90'}, {'codigo': 'aa4907', 'lacre': '1598a2384', 'valor': '19.90'}, {'codigo': '1fd307', 'lacre': '1598a20401', 'valor': '169.90'}] [{'codigo': '176690', 'lacre': '15980fd2293', 'valor': '49.90'}, {'codigo': 'sd4907', 'lacre': '1598a12385', 'valor': '19.90 '}, {'codigo': 'aa4907', 'lacre': '1598a2384', 'valor': '19.90'}, {'codigo': '1fd307', 'lacre': '1598a20401', 'valor': '169.90'}] [{'pagamento_id': '10aa95','valor': '84.9900','numero_parcelas': '10','finalizador_id': '4','finalizador_nome': 'Cartao de credito','docs': '500003','bandeira': 'MASTERCARD'}] [{'pagamento_id': '10aa95','valor': '84.9900','numero_parcelas': '10','finalizador_id': '4','finalizador_nome': 'Cartao de credito','docs': '500003 ','bandeira': '万事达卡'}]

As you can see, the last 2 columns are not getting the values properly, they have dictionary inside a list.如您所见,最后两列没有正确获取值,它们在列表中有字典。 How can I fix this?我怎样才能解决这个问题?

Try:尝试:

lst = [
    {
        "la_id": "33",
        "store": "1405fdsa6001209",
        "sell": "110aa346",
        "products": [
            {"codigo": "176690", "lacre": "15980fd2293", "valor": "49.90"},
            {"codigo": "sd4907", "lacre": "1598a12385", "valor": "19.90"},
            {"codigo": "aa4907", "lacre": "1598a2384", "valor": "19.90"},
            {"codigo": "1fd307", "lacre": "1598a20401", "valor": "169.90"},
        ],
        "payment": {
            "paymentid": "10a836",
            "value": "259.6000",
            "number": "4",
            "finalid": "4",
            "finalname": "Cartao de credito",
            "docs": "849763",
            "flag": None,
        },
    }
]

df = pd.json_normalize(lst).explode("products")
df = pd.concat([df, df.pop("products").apply(pd.Series)], axis=1)
print(df)

Prints:印刷:

  la_id            store      sell payment.paymentid payment.value payment.number payment.finalid  payment.finalname payment.docs payment.flag  codigo        lacre   valor
0    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None  176690  15980fd2293   49.90
0    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None  sd4907   1598a12385   19.90
0    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None  aa4907    1598a2384   19.90
0    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None  1fd307   1598a20401  169.90

EDIT: With updated input:编辑:使用更新的输入:

df = pd.concat([df, df.pop("payments").apply(pd.Series)], axis=1)
df = df.explode("product")
df = pd.concat([df, df.pop("product").apply(pd.Series)], axis=1)
print(df)

Prints:印刷:

   id            store      sell payment_id    valor number finalid   finalizador_nome    docs        flag  codigo        lacre   valor
0  33  1405fdsa6001209  110aa346     10aa95  84.9900     10       4  Cartao de credito  500003  MASTERCARD  176690  15980fd2293   49.90
0  33  1405fdsa6001209  110aa346     10aa95  84.9900     10       4  Cartao de credito  500003  MASTERCARD  sd4907   1598a12385   19.90
0  33  1405fdsa6001209  110aa346     10aa95  84.9900     10       4  Cartao de credito  500003  MASTERCARD  aa4907    1598a2384   19.90
0  33  1405fdsa6001209  110aa346     10aa95  84.9900     10       4  Cartao de credito  500003  MASTERCARD  1fd307   1598a20401  169.90

You can use pd.json_normalize() for each of:您可以对以下各项使用pd.json_normalize()

  1. Extract the main fields (including key la_id )提取主要字段(包括键la_id

  2. Extract the products details + key la_id提取products详细信息+密钥la_id

  3. Extract the pagamentos details + key la_id提取pagamentos详细信息 + 密钥la_id

Then, use .merge() to merge the 3 resultant dataframes using common key la_id , as follows:然后,使用.merge()使用公共键la_id合并 3 个结果数据帧,如下所示:

j_lst = [{'la_id': '33',
          'store': '1405fdsa6001209',
          'sell': '110aa346',
          'products': [{'codigo': '176690', 'lacre': '15980fd2293', 'valor': '49.90'},
                       {'codigo': 'sd4907', 'lacre': '1598a12385', 'valor': '19.90'},
                       {'codigo': 'aa4907', 'lacre': '1598a2384', 'valor': '19.90'},
                       {'codigo': '1fd307', 'lacre': '1598a20401', 'valor': '169.90'}],
          'payment': {'paymentid': '10a836',
                      'value': '259.6000',
                      'number': '4',
                      'finalid': '4',
                      'finalname': 'Cartao de credito',
                      'docs': '849763',
                      'flag': None},
          'pagamentos': [{'pagamento_id': '107795',
                          'valor': '854.9900',
                          'numero_parcelas': '10',
                          'finalizador_id': '4',
                          'finalizador_nome': 'Cartao de credito',
                          'documento': '500003',
                          'bandeira': 'MASTERCARD'}]}]


df_main = pd.json_normalize(j_lst)

df_products = pd.json_normalize(j_lst, record_path=['products'], record_prefix='products.', meta=['la_id'])

df_pagamentos = pd.json_normalize(j_lst, record_path=['pagamentos'], record_prefix='pagamentos.', meta=['la_id'])

df_out = (df_main.merge(df_products, on='la_id')
                 .merge(df_pagamentos, on='la_id')
                 .drop(['products', 'pagamentos'], axis=1)
         )

Result:结果:

print(df_out)

  la_id            store      sell payment.paymentid payment.value payment.number payment.finalid  payment.finalname payment.docs payment.flag products.codigo products.lacre products.valor pagamentos.pagamento_id pagamentos.valor pagamentos.numero_parcelas pagamentos.finalizador_id pagamentos.finalizador_nome pagamentos.documento pagamentos.bandeira
0    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None          176690    15980fd2293          49.90                  107795         854.9900                         10                         4           Cartao de credito               500003          MASTERCARD
1    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None          sd4907     1598a12385          19.90                  107795         854.9900                         10                         4           Cartao de credito               500003          MASTERCARD
2    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None          aa4907      1598a2384          19.90                  107795         854.9900                         10                         4           Cartao de credito               500003          MASTERCARD
3    33  1405fdsa6001209  110aa346            10a836      259.6000              4               4  Cartao de credito       849763         None          1fd307     1598a20401         169.90                  107795         854.9900                         10                         4           Cartao de credito               500003          MASTERCARD

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM