简体   繁体   中英

How to parse complex json with python?

I am trying to parse this json file and I am having trouble. The json looks like this:

    <ListObject list at 0x2161945a860> JSON: {
  "data": [
    {
      "amount": 100,
      "available_on": 1621382400,
      "created": 1621264875,
      "currency": "usd",
      "description": "0123456",
      "exchange_rate": null,
      "fee": 266,
      "fee_details": [
        {
          "amount": 266,
          "application": null,
          "currency": "usd",
          "description": "processing fees",
          "type": "fee"
        }
      ],
      "id": "txn_abvgd1234",
      "net": 9999,
      "object": "balance_transaction",
      "reporting_category": "charge",
      "source": "cust1",
      "sourced_transfers": {
        "data": [],
        "has_more": false,
        "object": "list",
        "total_count": 0,
        "url": "/v1/source"
      },
      "status": "pending",
      "type": "charge"
    },
    {
      "amount": 25984,
      "available_on": 1621382400,
      "created": 1621264866,
      "currency": "usd",
      "description": "0326489",
      "exchange_rate": null,
      "fee": 93,
      "fee_details": [
        {
          "amount": 93,
          "application": null,
          "currency": "usd",
          "description": "processing fees",
          "type": "fee"
        }
      ],
      "id": "txn_65987jihgf4984oihydgrd",
      "net": 9874,
      "object": "balance_transaction",
      "reporting_category": "charge",
      "source": "cust2",
      "sourced_transfers": {
        "data": [],
        "has_more": false,
        "object": "list",
        "total_count": 0,
        "url": "/v1/source"
      },
      "status": "pending",
      "type": "charge"
    },
  ],
  "has_more": true,
  "object": "list",
  "url": "/v1/balance_"
}

I am trying to parse it in python with this script:

import pandas as pd
df = pd.json_normalize(json)
df.head()

but what I am getting is:

在此处输入图像描述

What i need is to parse each of these data points in its own column. So i will have 2 row of data with columns for each data points. Something like this:

在此处输入图像描述

How do i do this now?

All but one of your fields are direct copies from the JSON, so you can just make a list of the fields you can copy, and then do the extra processing for the fee_details.

import json
import pandas as pd

inp =  """{
  "data": [
    {
      "amount": 100,
      "available_on": 1621382400,
      "created": 1621264875,
      "currency": "usd",
      "description": "0123456",
      "exchange_rate": null,
      "fee": 266,
      "fee_details": [
        {
          "amount": 266,
          "application": null,
          "currency": "usd",
          "description": "processing fees",
          "type": "fee"
        }
      ],
      "id": "txn_abvgd1234",
      "net": 9999,
      "object": "balance_transaction",
      "reporting_category": "charge",
      "source": "cust1",
      "sourced_transfers": {
        "data": [],
        "has_more": false,
        "object": "list",
        "total_count": 0,
        "url": "/v1/source"
      },
      "status": "pending",
      "type": "charge"
    },
    {
      "amount": 25984,
      "available_on": 1621382400,
      "created": 1621264866,
      "currency": "usd",
      "description": "0326489",
      "exchange_rate": null,
      "fee": 93,
      "fee_details": [
        {
          "amount": 93,
          "application": null,
          "currency": "usd",
          "description": "processing fees",
          "type": "fee"
        }
      ],
      "id": "txn_65987jihgf4984oihydgrd",
      "net": 9874,
      "object": "balance_transaction",
      "reporting_category": "charge",
      "source": "cust2",
      "sourced_transfers": {
        "data": [],
        "has_more": false,
        "object": "list",
        "total_count": 0,
        "url": "/v1/source"
      },
      "status": "pending",
      "type": "charge"
    }
  ],
  "has_more": true,
  "object": "list",
  "url": "/v1/balance_"
}"""

copies = [
    'id',
    'net',
    'object',
    'reporting_category',
    'source',
    'amount',
    'available_on',
    'created',
    'currency',
    'description',
    'exchange_rate',
    'fee'
]

data = json.loads(inp)
rows = []
for inrow in data['data']:
    outrow = {}
    for copy in copies:
        outrow[copy] = inrow[copy]
    outrow['fee_details'] = inrow['fee_details'][0]['description']
    rows.append(outrow)

df = pd.DataFrame(rows)
print(df)

Output:

timr@tims-gram:~/src$ python x.py
                           id   net               object reporting_category source  amount  ...     created  currency description exchange_rate  fee      fee_details
0               txn_abvgd1234  9999  balance_transaction             charge  cust1     100  ...  1621264875       usd     0123456          None  266  processing fees
1  txn_65987jihgf4984oihydgrd  9874  balance_transaction             charge  cust2   25984  ...  1621264866       usd     0326489          None   93  processing fees

[2 rows x 13 columns]
timr@tims-gram:~/src$ 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM