I have a json file with nested objects which is flattened in a pandas dataframe. There is a single column with nested json objects that I find hard the flatten.
I have tried many approaches, this is the approach that got me the furthest.
Help would be greatly appreciated, thank you.
Unfortunately I was unable to find a jsfiddle like alternative for python to provide a working example.
I understand that with the meta parameter of json_normalize I can add columns to my dataframe. But that approach does not work on the unflat column because I only got json_normalize to work well in my setup by setting record_path to 'markets' which is the main json object in my file. So in this setup I am unable to record_path to 'marketStats' and add any relevant columns via the meta parameter.
Goal
The goal is to transform one or all json objects in the marketStats object in to columns of the dataframe.
Code
with open('Data/20012022.json') as file:
data = json.loads(file.read())
# Flatten data
df0 = pd.json_normalize(
data,
record_path =['markets']
)
df0.head(3)
Screenshot
This is a screenshot of what the table currently looks like, the marketStats column contains nested json.
Data
This a snippet from the json file. `
{
"markets": [
{
"id": 335,
"baseCurrency": "eth",
"quoteCurrency": "btc",
"exchangeName": "Binance",
"exchangeCode": "BINA",
"longName": "BTC-ETH",
"marketName": "btc-eth",
"symbol": "ETHBTC",
"volume": "40624.5823",
"quoteVolume": "3026.13646935",
"btcVolume": "3026.13646935",
"usdVolume": "127009429.050524367",
"currentPrice": 0.074681,
"latestBase": {
"id": 161774475,
"time": 1639576800,
"date": "2021-12-15T14:00:00.000+00:00",
"price": "0.077653",
"lowestPrice": "0.0729",
"bounce": "6.283",
"currentDrop": "-3.8272829124438206",
"crackedAt": "2022-01-07T03:00:00.000Z",
"respectedAt": "2022-01-15T15:00:00.000Z",
"isLowest": false
},
"marketStats": [
{
"algorithm": "original",
"ratio": "50.0",
"medianDrop": "-4.08",
"medianBounce": "5.51",
"hoursToRespected": 106,
"crackedCount": 2,
"respectedCount": 1
},
{
"algorithm": "day_trade",
"ratio": "100.0",
"medianDrop": "-6.12",
"medianBounce": "6.28",
"hoursToRespected": 204,
"crackedCount": 1,
"respectedCount": 1
},
{
"algorithm": "conservative",
"ratio": "100.0",
"medianDrop": "-6.12",
"medianBounce": "8.38",
"hoursToRespected": 204,
"crackedCount": 1,
"respectedCount": 1
},
{
"algorithm": "position",
"ratio": "50.0",
"medianDrop": "-6.12",
"medianBounce": "6.19",
"hoursToRespected": 204,
"crackedCount": 2,
"respectedCount": 1
},
{
"algorithm": "hodloo",
"ratio": "50.0",
"medianDrop": "-3.29",
"medianBounce": "0.0",
"hoursToRespected": 225,
"crackedCount": 4,
"respectedCount": 2
}
]
},
{
"id": 337,
"baseCurrency": "ltc",
"quoteCurrency": "btc",
"exchangeName": "Binance",
"exchangeCode": "BINA",
"longName": "BTC-LTC",
"marketName": "btc-ltc",
"symbol": "LTCBTC",
"volume": "68309.637",
"quoteVolume": "223.79294524",
"btcVolume": "223.79294524",
"usdVolume": "9392773.4219378968",
"currentPrice": 0.003275,
"latestBase": {
"id": 163982984,
"time": 1642374000,
"date": "2022-01-16T23:00:00.000+00:00",
"price": "0.003346",
"lowestPrice": "0.00322",
"bounce": "3.839",
"currentDrop": "-2.1219366407650926",
"crackedAt": "2022-01-18T23:00:00.000Z",
"respectedAt": null,
"isLowest": false
},
"marketStats": [
{
"algorithm": "original",
"ratio": "57.14",
"medianDrop": "-3.28",
"medianBounce": "3.84",
"hoursToRespected": 186,
"crackedCount": 7,
"respectedCount": 4
},
{
"algorithm": "day_trade",
"ratio": "0.0",
"medianDrop": "0.0",
"medianBounce": "5.68",
"hoursToRespected": 0,
"crackedCount": 1,
"respectedCount": 0
},
{
"algorithm": "conservative",
"ratio": "0.0",
"medianDrop": "0.0",
"medianBounce": "5.68",
"hoursToRespected": 0,
"crackedCount": 1,
"respectedCount": 0
},
{
"algorithm": "position",
"ratio": "0.0",
"medianDrop": "0.0",
"medianBounce": "8.16",
"hoursToRespected": 0,
"crackedCount": 1,
"respectedCount": 0
},
{
"algorithm": "hodloo",
"ratio": "75.0",
"medianDrop": "-3.7",
"medianBounce": "0.0",
"hoursToRespected": 35,
"crackedCount": 4,
"respectedCount": 3
}
]
},
{
"id": 339,
"baseCurrency": "bnb",
"quoteCurrency": "btc",
"exchangeName": "Binance",
"exchangeCode": "BINA",
"longName": "BTC-BNB",
"marketName": "btc-bnb",
"symbol": "BNBBTC",
"volume": "154576.177",
"quoteVolume": "1724.66664804",
"btcVolume": "1724.66664804",
"usdVolume": "72385673.4448901928",
"currentPrice": 0.01099,
"latestBase": {
"id": 163753765,
"time": 1642068000,
"date": "2022-01-13T10:00:00.000+00:00",
"price": "0.01093",
"lowestPrice": "0.01093",
"bounce": "3.102",
"currentDrop": "0.5489478499542543",
"crackedAt": null,
"respectedAt": null,
"isLowest": false
},
"marketStats": [
{
"algorithm": "original",
"ratio": "100.0",
"medianDrop": "-7.18",
"medianBounce": "4.34",
"hoursToRespected": 62,
"crackedCount": 2,
"respectedCount": 2
},
{
"algorithm": "day_trade",
"ratio": "100.0",
"medianDrop": "-6.19",
"medianBounce": "4.3",
"hoursToRespected": 63,
"crackedCount": 1,
"respectedCount": 1
},
{
"algorithm": "conservative",
"ratio": "66.67",
"medianDrop": "-3.15",
"medianBounce": "4.05",
"hoursToRespected": 62,
"crackedCount": 3,
"respectedCount": 2
},
{
"algorithm": "position",
"ratio": "100.0",
"medianDrop": "-3.15",
"medianBounce": "4.46",
"hoursToRespected": 60,
"crackedCount": 2,
"respectedCount": 2
},
{
"algorithm": "hodloo",
"ratio": "100.0",
"medianDrop": "-7.46",
"medianBounce": "0.0",
"hoursToRespected": 62,
"crackedCount": 5,
"respectedCount": 5
}
]
}
]
}
Here is one way selecting a single object. Not sure how you intend to combine all of them.
df = data.markets.apply(pd.Series)
new = df.marketStats.apply(lambda x: pd.Series(x[0])) # Getting first one only
combined = pd.concat([df, new], axis=1)
Output:
id baseCurrency quoteCurrency exchangeName exchangeCode longName marketName ... algorithm ratio medianDrop medianBounce hoursToRespected crackedCount respectedCount
0 335 eth btc Binance BINA BTC-ETH btc-eth ... original 50.0 -4.08 5.51 106 2 1
1 337 ltc btc Binance BINA BTC-LTC btc-ltc ... original 57.14 -3.28 3.84 186 7 4
2 339 bnb btc Binance BINA BTC-BNB btc-bnb ... original 100.0 -7.18 4.34 62 2 2
You can apply some post-processing to df0
to achieve what you want. Here you can apply explode
followed by apply(pf.Series)
applied to 'marketStats'
column:
df1 = df0.explode('marketStats')['marketStats'].apply(pd.Series)
df1
looks like this:
algorithm ratio medianDrop medianBounce hoursToRespected crackedCount respectedCount
-- ------------ ------- ------------ -------------- ------------------ -------------- ----------------
0 original 50 -4.08 5.51 106 2 1
0 day_trade 100 -6.12 6.28 204 1 1
0 conservative 100 -6.12 8.38 204 1 1
0 position 50 -6.12 6.19 204 2 1
0 hodloo 50 -3.29 0 225 4 2
1 original 57.14 -3.28 3.84 186 7 4
1 day_trade 0 0 5.68 0 1 0
1 conservative 0 0 5.68 0 1 0
1 position 0 0 8.16 0 1 0
1 hodloo 75 -3.7 0 35 4 3
2 original 100 -7.18 4.34 62 2 2
2 day_trade 100 -6.19 4.3 63 1 1
2 conservative 66.67 -3.15 4.05 62 3 2
2 position 100 -3.15 4.46 60 2 2
2 hodloo 100 -7.46 0 62 5 5
if you want it combined with all the other columns you can use join
:
df0.join(df1)
I am not going to post the output of this command as it is rather large
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.