简体   繁体   中英

How to flatten a column of nested json objects in an already flattened dataframe

I have a json file with nested objects which is flattened in a pandas dataframe. There is a single column with nested json objects that I find hard the flatten.

I have tried many approaches, this is the approach that got me the furthest.

Help would be greatly appreciated, thank you.

Unfortunately I was unable to find a jsfiddle like alternative for python to provide a working example.

I understand that with the meta parameter of json_normalize I can add columns to my dataframe. But that approach does not work on the unflat column because I only got json_normalize to work well in my setup by setting record_path to 'markets' which is the main json object in my file. So in this setup I am unable to record_path to 'marketStats' and add any relevant columns via the meta parameter.

Goal

The goal is to transform one or all json objects in the marketStats object in to columns of the dataframe.

Code

with open('Data/20012022.json') as file:
data = json.loads(file.read())

# Flatten data
df0 = pd.json_normalize(
      data, 
      record_path =['markets']
)

df0.head(3)

Screenshot

This is a screenshot of what the table currently looks like, the marketStats column contains nested json.

在此处输入图像描述

Data

This a snippet from the json file. `

{
  "markets": [
    {
      "id": 335,
      "baseCurrency": "eth",
      "quoteCurrency": "btc",
      "exchangeName": "Binance",
      "exchangeCode": "BINA",
      "longName": "BTC-ETH",
      "marketName": "btc-eth",
      "symbol": "ETHBTC",
      "volume": "40624.5823",
      "quoteVolume": "3026.13646935",
      "btcVolume": "3026.13646935",
      "usdVolume": "127009429.050524367",
      "currentPrice": 0.074681,
      "latestBase": {
        "id": 161774475,
        "time": 1639576800,
        "date": "2021-12-15T14:00:00.000+00:00",
        "price": "0.077653",
        "lowestPrice": "0.0729",
        "bounce": "6.283",
        "currentDrop": "-3.8272829124438206",
        "crackedAt": "2022-01-07T03:00:00.000Z",
        "respectedAt": "2022-01-15T15:00:00.000Z",
        "isLowest": false
      },
      "marketStats": [
        {
          "algorithm": "original",
          "ratio": "50.0",
          "medianDrop": "-4.08",
          "medianBounce": "5.51",
          "hoursToRespected": 106,
          "crackedCount": 2,
          "respectedCount": 1
        },
        {
          "algorithm": "day_trade",
          "ratio": "100.0",
          "medianDrop": "-6.12",
          "medianBounce": "6.28",
          "hoursToRespected": 204,
          "crackedCount": 1,
          "respectedCount": 1
        },
        {
          "algorithm": "conservative",
          "ratio": "100.0",
          "medianDrop": "-6.12",
          "medianBounce": "8.38",
          "hoursToRespected": 204,
          "crackedCount": 1,
          "respectedCount": 1
        },
        {
          "algorithm": "position",
          "ratio": "50.0",
          "medianDrop": "-6.12",
          "medianBounce": "6.19",
          "hoursToRespected": 204,
          "crackedCount": 2,
          "respectedCount": 1
        },
        {
          "algorithm": "hodloo",
          "ratio": "50.0",
          "medianDrop": "-3.29",
          "medianBounce": "0.0",
          "hoursToRespected": 225,
          "crackedCount": 4,
          "respectedCount": 2
        }
      ]
    },
    {
      "id": 337,
      "baseCurrency": "ltc",
      "quoteCurrency": "btc",
      "exchangeName": "Binance",
      "exchangeCode": "BINA",
      "longName": "BTC-LTC",
      "marketName": "btc-ltc",
      "symbol": "LTCBTC",
      "volume": "68309.637",
      "quoteVolume": "223.79294524",
      "btcVolume": "223.79294524",
      "usdVolume": "9392773.4219378968",
      "currentPrice": 0.003275,
      "latestBase": {
        "id": 163982984,
        "time": 1642374000,
        "date": "2022-01-16T23:00:00.000+00:00",
        "price": "0.003346",
        "lowestPrice": "0.00322",
        "bounce": "3.839",
        "currentDrop": "-2.1219366407650926",
        "crackedAt": "2022-01-18T23:00:00.000Z",
        "respectedAt": null,
        "isLowest": false
      },
      "marketStats": [
        {
          "algorithm": "original",
          "ratio": "57.14",
          "medianDrop": "-3.28",
          "medianBounce": "3.84",
          "hoursToRespected": 186,
          "crackedCount": 7,
          "respectedCount": 4
        },
        {
          "algorithm": "day_trade",
          "ratio": "0.0",
          "medianDrop": "0.0",
          "medianBounce": "5.68",
          "hoursToRespected": 0,
          "crackedCount": 1,
          "respectedCount": 0
        },
        {
          "algorithm": "conservative",
          "ratio": "0.0",
          "medianDrop": "0.0",
          "medianBounce": "5.68",
          "hoursToRespected": 0,
          "crackedCount": 1,
          "respectedCount": 0
        },
        {
          "algorithm": "position",
          "ratio": "0.0",
          "medianDrop": "0.0",
          "medianBounce": "8.16",
          "hoursToRespected": 0,
          "crackedCount": 1,
          "respectedCount": 0
        },
        {
          "algorithm": "hodloo",
          "ratio": "75.0",
          "medianDrop": "-3.7",
          "medianBounce": "0.0",
          "hoursToRespected": 35,
          "crackedCount": 4,
          "respectedCount": 3
        }
      ]
    },
    {
      "id": 339,
      "baseCurrency": "bnb",
      "quoteCurrency": "btc",
      "exchangeName": "Binance",
      "exchangeCode": "BINA",
      "longName": "BTC-BNB",
      "marketName": "btc-bnb",
      "symbol": "BNBBTC",
      "volume": "154576.177",
      "quoteVolume": "1724.66664804",
      "btcVolume": "1724.66664804",
      "usdVolume": "72385673.4448901928",
      "currentPrice": 0.01099,
      "latestBase": {
        "id": 163753765,
        "time": 1642068000,
        "date": "2022-01-13T10:00:00.000+00:00",
        "price": "0.01093",
        "lowestPrice": "0.01093",
        "bounce": "3.102",
        "currentDrop": "0.5489478499542543",
        "crackedAt": null,
        "respectedAt": null,
        "isLowest": false
      },
      "marketStats": [
        {
          "algorithm": "original",
          "ratio": "100.0",
          "medianDrop": "-7.18",
          "medianBounce": "4.34",
          "hoursToRespected": 62,
          "crackedCount": 2,
          "respectedCount": 2
        },
        {
          "algorithm": "day_trade",
          "ratio": "100.0",
          "medianDrop": "-6.19",
          "medianBounce": "4.3",
          "hoursToRespected": 63,
          "crackedCount": 1,
          "respectedCount": 1
        },
        {
          "algorithm": "conservative",
          "ratio": "66.67",
          "medianDrop": "-3.15",
          "medianBounce": "4.05",
          "hoursToRespected": 62,
          "crackedCount": 3,
          "respectedCount": 2
        },
        {
          "algorithm": "position",
          "ratio": "100.0",
          "medianDrop": "-3.15",
          "medianBounce": "4.46",
          "hoursToRespected": 60,
          "crackedCount": 2,
          "respectedCount": 2
        },
        {
          "algorithm": "hodloo",
          "ratio": "100.0",
          "medianDrop": "-7.46",
          "medianBounce": "0.0",
          "hoursToRespected": 62,
          "crackedCount": 5,
          "respectedCount": 5
        }
      ]
    }
  ]
}

Here is one way selecting a single object. Not sure how you intend to combine all of them.

df = data.markets.apply(pd.Series)                                                            
new = df.marketStats.apply(lambda x: pd.Series(x[0]))  # Getting first one only                                        
combined = pd.concat([df, new], axis=1)

Output:

  id baseCurrency quoteCurrency exchangeName exchangeCode longName marketName  ... algorithm  ratio medianDrop medianBounce hoursToRespected  crackedCount respectedCount
0  335          eth           btc      Binance         BINA  BTC-ETH    btc-eth  ...  original   50.0      -4.08         5.51              106             2              1
1  337          ltc           btc      Binance         BINA  BTC-LTC    btc-ltc  ...  original  57.14      -3.28         3.84              186             7              4
2  339          bnb           btc      Binance         BINA  BTC-BNB    btc-bnb  ...  original  100.0      -7.18         4.34               62             2              2

You can apply some post-processing to df0 to achieve what you want. Here you can apply explode followed by apply(pf.Series) applied to 'marketStats' column:

df1 = df0.explode('marketStats')['marketStats'].apply(pd.Series)

df1 looks like this:

    algorithm       ratio    medianDrop    medianBounce    hoursToRespected    crackedCount    respectedCount
--  ------------  -------  ------------  --------------  ------------------  --------------  ----------------
 0  original        50            -4.08            5.51                 106               2                 1
 0  day_trade      100            -6.12            6.28                 204               1                 1
 0  conservative   100            -6.12            8.38                 204               1                 1
 0  position        50            -6.12            6.19                 204               2                 1
 0  hodloo          50            -3.29            0                    225               4                 2
 1  original        57.14         -3.28            3.84                 186               7                 4
 1  day_trade        0             0               5.68                   0               1                 0
 1  conservative     0             0               5.68                   0               1                 0
 1  position         0             0               8.16                   0               1                 0
 1  hodloo          75            -3.7             0                     35               4                 3
 2  original       100            -7.18            4.34                  62               2                 2
 2  day_trade      100            -6.19            4.3                   63               1                 1
 2  conservative    66.67         -3.15            4.05                  62               3                 2
 2  position       100            -3.15            4.46                  60               2                 2
 2  hodloo         100            -7.46            0                     62               5                 5

if you want it combined with all the other columns you can use join :

df0.join(df1)

I am not going to post the output of this command as it is rather large

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM