[英]How to convert nested json in csv with pandas
我有一個嵌套的 json 文件(100k 行),如下所示:
{"UniqueId":"4224f3c9-323c-e911-a820-a7f2c9e35195","TransactionDateUTC":"2019-03-01 15:00:52.627 UTC","Itinerary":"MUC-CPH-ARN-MUC","OriginAirportCode":"MUC","DestinationAirportCode":"CPH","OneWayOrReturn":"Return","Segment":[{"DepartureAirportCode":"MUC","ArrivalAirportCode":"CPH","SegmentNumber":"1","LegNumber":"1","NumberOfPassengers":"1"},{"DepartureAirportCode":"ARN","ArrivalAirportCode":"MUC","SegmentNumber":"2","LegNumber":"1","NumberOfPassengers":"1"}]}
我正在嘗試創建一個 csv,以便可以輕松地將其加載到 rdbms 中。 我正在嘗試在 pandas 中使用 json_normalize() 但即使在我到達那里之前我也遇到了錯誤。
with open('transactions.json') as data_file:
data = json.load(data_file)
JSONDecodeError: Extra data: line 2 column 1 (char 466)
如果您的問題源於讀取 json 文件本身,那么我將使用:
json.loads()
然后使用
pd.read_csv()
如果您的問題源於從 json 字典到 dataframe 的轉換,您可以使用以下命令:
test = {"UniqueId":"4224f3c9-323c-e911-a820-a7f2c9e35195","TransactionDateUTC":"2019-03-01 15:00:52.627 UTC","Itinerary":"MUC-CPH-ARN-MUC","OriginAirportCode":"MUC","DestinationAirportCode":"CPH","OneWayOrReturn":"Return","Segment":[{"DepartureAirportCode":"MUC","ArrivalAirportCode":"CPH","SegmentNumber":"1","LegNumber":"1","NumberOfPassengers":"1"},{"DepartureAirportCode":"ARN","ArrivalAirportCode":"MUC","SegmentNumber":"2","LegNumber":"1","NumberOfPassengers":"1"}]}
import json
import pandas
# convert json to string and read
df = pd.read_json(json.dumps(test), convert_axes=True)
# 'unpack' the dict as series and merge them with original df
df = pd.concat([df, df.Segment.apply(pd.Series)], axis=1)
# remove dict
df.drop('Segment', axis=1, inplace=True)
那將是我的方法,但可能有更方便的方法。
@wolfstter 提供了關於如何處理一條記錄的建議。 現在,您需要遍歷文件中的所有記錄,您可以這樣做:
with open('transactions.json', encoding="utf8") as data_file:
for line in data_file:
df = pd.read_json(line, convert_axes=True)
# or: data = json.loads(line)
...
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.