簡體   English   中英

使用來自url的pandas在python中讀取多級json

[英]Read multi-level json in python with pandas from url

我嘗試使用 Pandas 讀取多級 JSON 並將數據存儲在數據幀中,以便下次使用它或進行打印。 我的主要目標是了解如何從 JSON 的每個級別讀取數據。

這是我的第一步,它有效:

import pandas as pd 
import requests
log = ("user", "password")
url = "http://serverxyz/api/v1/Catalog/Categories?pageSize=2&pageIndex=0"

req = requests.get(url, auth = log)
req.raise_for_status()
d = req.json()

#what is next step?
#something like this? df = pd.DataFrame.from_dict(d.Data)

你能告訴我,如何閱讀:

  • 第一級(PageIndex、PageSize、TotalCount、Data 列)
  • 2 級(來自數據列代碼、時間戳、類別、快照)
  • 3 級(來自數據和快照列 Code、DateFrom、DateTo、Type ...)
  • 下一步處理數據的一些好技巧?
  • 也許你告訴我,使用 Pandas 並不是讀取 JSON 的最佳方式

這是json:

我要從 OneDrive 下載的 json 文件

{"PageIndex":0,"PageSize":2,"TotalCount":100248,"Data":[{"Code":"859182400102974","Timestamp":"2019-04-17T12:16:51Z","Category":0,"snapshots":[{"Code":"859182400102974","DateFrom":"2016-12-31T23:00:00Z","DateTo":"2017-05-09T22:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-502,"TDDClass":"004","TempArea":"009","IsForeign":false,"IsSLRActive":false,"DGIFrequency":1,"FirstMonthReading":5,"IsCompositeService":true,"IsAggregatedInvoice":true,"IsImplicitSoS":false,"ReservedPower":0,"PhasesCount":"3","IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Petra"},{"Code":"859182400102974","DateFrom":"2017-05-09T22:00:00Z","DateTo":"2018-01-31T23:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-382,"TDDClass":"004","TempArea":"009","IsForeign":false,"IsSLRActive":false,"DGIFrequency":1,"FirstMonthReading":5,"IsCompositeService":true,"IsAggregatedInvoice":true,"IsImplicitSoS":false,"ReservedPower":0,"PhasesCount":"3","IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Petra"}],"scalars":{"ConsumptionEstimation":[{"DateFrom":"2016-12-31T23:00:00Z","DateTo":"2017-05-09T22:00:00Z","ConsumptionEstimation":-502},{"DateFrom":"2017-05-09T22:00:00Z","DateTo":"2018-01-31T23:00:00Z","ConsumptionEstimation":-382}],"ConsumptionEstimation2":[{"DateFrom":"2016-12-31T23:00:00Z","DateTo":"2017-05-09T22:00:00Z","ConsumptionEstimation2":-502},{"DateFrom":"2017-05-09T22:00:00Z","DateTo":"2018-01-31T23:00:00Z","ConsumptionEstimation2":-382}]}},{"Code":"859182400104897","Timestamp":"2019-04-17T12:16:51Z","Category":0,"snapshots":[{"Code":"859182400104897","DateFrom":"2016-11-18T23:00:00Z","DateTo":"2017-11-05T23:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-280,"TDDClass":"004","TempArea":"009","IsForeign":false,"Address":{"Street":"Okružní","City":"Semovo Ústí","PostCode":"39102"},"IsSLRActive":false,"DGIFrequency":0,"FirstMonthReading":0,"IsCompositeService":false,"IsAggregatedInvoice":false,"IsImplicitSoS":false,"ReservedPower":0,"IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Martin"},{"Code":"859182400104897","DateFrom":"2017-11-05T23:00:00Z","DateTo":"2027-01-16T23:00:00Z","Type":"CCO","VoltageLevel":400,"IsIsland":false,"IsPps":false,"MeasurementType":"CMC","InstalledPower":0,"GridId":11,"MeteredDataProvider":"8591824048108","Supplier":"8591824071403","SubjectOfSettlement":"8591824071403","IsSummarizingForSubjectOfSettlement":false,"AnnualConsumptionEstimation":-282,"TDDClass":"004","TempArea":"009","IsForeign":false,"Address":{"Street":"Okružní","City":"Semovo Ústí","PostCode":"39102"},"IsSLRActive":false,"DGIFrequency":0,"FirstMonthReading":0,"IsCompositeService":false,"IsAggregatedInvoice":false,"IsImplicitSoS":false,"ReservedPower":0,"IsMicrosource":false,"IsDisconnectionPlanned":false,"Name":"Martin"}],"scalars":{"ConsumptionEstimation":[{"DateFrom":"2016-11-18T23:00:00Z","DateTo":"2017-11-05T23:00:00Z","ConsumptionEstimation":-280},{"DateFrom":"2017-11-05T23:00:00Z","DateTo":"2027-01-16T23:00:00Z","ConsumptionEstimation":-282}],"ConsumptionEstimation2":[{"DateFrom":"2016-11-18T23:00:00Z","DateTo":"2017-11-05T23:00:00Z","ConsumptionEstimation2":-280},{"DateFrom":"2017-11-05T23:00:00Z","DateTo":"2027-01-16T23:00:00Z","ConsumptionEstimation2":-282}]}}]}

謝謝

我認為使用pandas處理 JSON 不是一個好的選擇,因為pandas正在嘗試處理結構化數據,但在您的示例中,您正在處理多級非結構化數據
但是如果你堅持這樣做,你可以從你的 JSON 結構中提取結構數據。 例如,您可以將JSON_ROOT."Data"."snapshots"array提取到 ArrayList 中,然后將其保存到pd.DataFrame 否則,你只能在JSON結構保存為一個string在一列pd.DataFrame

從上面的答案來看,我沒有以前那么聰明了。

所以我嘗試將我的問題簡化為一個問題。 如何獲得 4 列的表格:Data.Code; Data.snapshots.DateFrom; 數據.快照.地址.街道; 數據.快照.地址.城市

這是我的代碼,但有必要更正它,但我不知道如何。 該代碼有效,但它返回 30 列,而不是我想要的。

import pandas as pd
import requests
import pandas.io.json as pd_json

log = ("user", "password")
url = "http://serverxyz/api/v1/Catalog/Categories?pageSize=2&pageIndex=0"

req = requests.get(url, auth = log)
req.raise_for_status()
fin = req.json()

df = pd_json.json_normalize(fin, 
                        record_path=['Data','snapshots'],
                        record_prefix = 'Data.',
                        errors = 'ignore'
                        )

print(df)

謝謝你的幫助。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM