![](/img/trans.png)
[英]How to convert JSON data inside a spark dataframe into new columns
[英]How to convert JSON data inside a pandas column into new columns
我有這個簡短版本的 ADSB json 數據,並希望將其轉換為 dataFrame 列,如 Icao、Alt、Lat、Long、Spd、Cou .....
在 Alperen 告訴我這樣做之后
df = pd.read_json('2016-06-20-2359Z.json', lines=True),
我可以將它加載到 DataFrame 中。 但是, df.acList
是
[{'Id': 10537990, 'Rcvr': 1, 'HasSig': False, ... Name: acList, dtype: object
如何獲取 Icao、Alt、Lat、Long、Spd、Cou 數據?
"src":1, "feeds":[ { "id":1, "name":"ADSBexchange.com", "polarPlot":false } ], "srcFeed":1, "showSil":true, "showFlg":true, "showPic":true, "flgH":20, "flgW":85, "acList":[ { "Id":11281748, "Rcvr":1, "HasSig":false, "Icao":"AC2554", "Bad":false, "Reg":"N882AS", "FSeen":"\/Date(1466467166951)\/", "TSecs":3, "CMsgs":1, "AltT":0, "Tisb":false, "TrkH":false, "Type":"CRJ2", "Mdl":"2001 BOMBARDIER INC CL-600-2B19", "Man":"Bombardier", "CNum":"7503", "Op":"EXPRESSJET AIRLINES INC - ATLANTA, GA", "OpIcao":"ASQ", "Sqk":"", "VsiT":0, "WTC":2, "Species":1, "Engines":"2", "EngType":3, "EngMount":1, "Mil":false, "Cou":"United States", "HasPic":false, "Interested":false, "FlightsCount":0, "Gnd":false, "SpdTyp":0, "CallSus":false, "TT":"a", "Trt":1, "Year":"2001" }, { "Id":11402205, "Rcvr":1, "HasSig":true, "Sig":110, "Icao":"ADFBDD", "Bad":false, "FSeen":"\/Date(1466391940977)\/", "TSecs":75229, "CMsgs":35445, "Alt":8025, "GAlt":8025, "AltT":0, "Call":"TEST1234", "Tisb":false, "TrkH":false, "Sqk":"0262", "Help":false, "VsiT":0, "WTC":0, "Species":0, "EngType":0, "EngMount":0, "Mil":true, "Cou":"United States", "HasPic":false, "Interested":false, "FlightsCount":0, "Gnd":true, "SpdTyp":0, "CallSus":false, "TT":"a", "Trt":1 } ], "totalAc":4231, "lastDv":"636019887431643594", "shtTrlSec":61, "stm":1466467170029 }
如果您已經在acList
列中擁有您的數據,只需執行以下操作:
import pandas as pd
pd.io.json.json_normalize(df.acList[0])
Alt AltT Bad CMsgs CNum Call CallSus Cou EngMount EngType ... Sqk TSecs TT Tisb TrkH Trt Type VsiT WTC Year
0 NaN 0 False 1 7503 NaN False United States 1 3 ... 3 a False False 1 CRJ2 0 2 2001
1 8025.0 0 False 35445 NaN TEST1234 False United States 0 0 ... 0262 75229 a False False 1 NaN 0 0 NaN
從 pandas 1.0 開始,進口應該是:
import pandas as pd
pd.json_normalize(df.acList[0])
@Sergey 的回答為我解決了這個問題,但我遇到了問題,因為我的數據框列中的 json 被保存為字符串而不是對象。 我必須添加映射列的附加步驟:
import json
import pandas as pd
pd.io.json.json_normalize(df.acList.apply(json.loads))
從pandas 1.0 開始, json_normalize 在頂級命名空間中可用。 因此使用:
import pandas as pd
pd.json_normalize(df.acList[0])
我還不能對 ThinkBonobo 的回答發表評論,但如果列中的 JSON 不完全是字典,您可以繼續執行.apply
直到它是。 所以就我而言
import json
import pandas as pd
json_normalize(
df
.theColumnWithJson
.apply(json.loads)
.apply(lambda x: x[0]) # the inner JSON is list with the dictionary as the only item
)
在我的情況下,我有一些缺失值( None
)然后我創建了一個更具體的代碼,該代碼在創建新列后也刪除了原始列:
for prefix in ['column1', 'column2']:
df_temp = df[prefix].apply(lambda x: {} if pd.isna(x) else x)
df_temp = pd.io.json.json_normalize(df_temp)
df_temp = df_temp.add_prefix(prefix + '_')
df.drop([prefix], axis=1, inplace=True)
df = pd.concat([df, df_temp], axis = 1, sort=False)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.