簡體   English   中英

如何使用 pandas 轉換 csv 中的嵌套 json

[英]How to convert nested json in csv with pandas

我有一個嵌套的 json 文件(100k 行),如下所示:

{"UniqueId":"4224f3c9-323c-e911-a820-a7f2c9e35195","TransactionDateUTC":"2019-03-01 15:00:52.627 UTC","Itinerary":"MUC-CPH-ARN-MUC","OriginAirportCode":"MUC","DestinationAirportCode":"CPH","OneWayOrReturn":"Return","Segment":[{"DepartureAirportCode":"MUC","ArrivalAirportCode":"CPH","SegmentNumber":"1","LegNumber":"1","NumberOfPassengers":"1"},{"DepartureAirportCode":"ARN","ArrivalAirportCode":"MUC","SegmentNumber":"2","LegNumber":"1","NumberOfPassengers":"1"}]}

我正在嘗試創建一個 csv,以便可以輕松地將其加載到 rdbms 中。 我正在嘗試在 pandas 中使用 json_normalize() 但即使在我到達那里之前我也遇到了錯誤。

with open('transactions.json') as data_file:    
    data = json.load(data_file)

JSONDecodeError: Extra data: line 2 column 1 (char 466)

如果您的問題源於讀取 json 文件本身,那么我將使用:

json.loads() 

然后使用

pd.read_csv()

如果您的問題源於從 json 字典到 dataframe 的轉換,您可以使用以下命令:

test = {"UniqueId":"4224f3c9-323c-e911-a820-a7f2c9e35195","TransactionDateUTC":"2019-03-01 15:00:52.627 UTC","Itinerary":"MUC-CPH-ARN-MUC","OriginAirportCode":"MUC","DestinationAirportCode":"CPH","OneWayOrReturn":"Return","Segment":[{"DepartureAirportCode":"MUC","ArrivalAirportCode":"CPH","SegmentNumber":"1","LegNumber":"1","NumberOfPassengers":"1"},{"DepartureAirportCode":"ARN","ArrivalAirportCode":"MUC","SegmentNumber":"2","LegNumber":"1","NumberOfPassengers":"1"}]}

import json
import pandas

# convert json to string and read
df = pd.read_json(json.dumps(test), convert_axes=True)

# 'unpack' the dict as series and merge them with original df
df = pd.concat([df, df.Segment.apply(pd.Series)], axis=1)

# remove dict
df.drop('Segment', axis=1, inplace=True)

那將是我的方法,但可能有更方便的方法。

@wolfstter 提供了關於如何處理一條記錄的建議。 現在,您需要遍歷文件中的所有記錄,您可以這樣做:

with open('transactions.json', encoding="utf8") as data_file:
    for line in data_file:
        df = pd.read_json(line, convert_axes=True)
        # or: data = json.loads(line) 
        ...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM