簡體   English   中英

在Pandas DataFrame中解析列,其中一列包含嵌套的JSON字符串

[英]Parsing Column in Pandas DataFrame with one column that contains a nested JSON string

我在Python中有一個DataFrame,看起來像下面的一個。 有一列(在下文中稱為“ json”),其中包含一個大的嵌套JSON字符串。 我該如何解析它,這樣我就可以擁有一個包含許多列的干凈數據框。 只是特別需要在單獨的列中每個ID的費用和每月金額。 理想情況下,我有一張看起來像的桌子:

ID,名稱,費用,每月

10001,坦率,15.85,15.85

10002,瑪麗,30.86,23.03

    d = {'id': ['10001', '10002'], 'json': ['{"costs":[{"cost":15.85}],"policies":[{"logo":"HLIF-transparent-inhouse.png","monthly":15.85,"rating":"A++","waiverOfPremium":1.74,"carrier":"companyabc","face":250000,"term":20,"newFace":null,"newMonthly":null,"isCompanyD":true,"carrierCode":"xyz","product":"XYZt"}],"agentSuggestion":{"costs":[{"cost":15.85}],"options":{"product":"XYZt","gender":"male","healthClass":"0","smoker":"false","age":32,"term":"20","faceAmount":250000,"waiverOfPremiumAmount":1.74,"includeWaiverOfPremium":false,"state":"CT"},"policies":[{"logo":"HLIF-transparent-inhouse.png","monthly":15.85,"rating":"A++","waiverOfPremium":1.74,"carrier":"companyabc","face":250000,"term":20,"newFace":null,"newMonthly":null,"isCompanyD":true,"carrierCode":"xyz","product":"XYZt"}]}}', '{"costs":[{"cost":30.86}],"policies":[{"logo":"HLIF-transparent-inhouse.png","monthly":23.03,"rating":"A++","waiverOfPremium":7.83,"carrier":"companyabc","face":1000000,"term":10,"newFace":null,"newMonthly":null,"isCompanyD":true,"carrierCode":"xyz","product":"XYZt"}],"agentSuggestion":{"costs":[{"cost":30.86}],"options":{"product":"XYZt","gender":"female","healthClass":"0","smoker":"false","age":35,"term":10,"faceAmount":1000000,"waiverOfPremiumAmount":7.83,"includeWaiverOfPremium":true,"state":"GA"},"policies":[{"logo":"HLIF-transparent-inhouse.png","monthly":23.03,"rating":"A++","waiverOfPremium":7.83,"carrier":"companyabc","face":1000000,"term":10,"newFace":null,"newMonthly":null,"isCompanyD":true,"carrierCode":"xyz","product":"XYZt"}]}}'], 'name':['frank','mary']}

   test = pd.DataFrame(data=d)

妳去 JSON中有2種不同的成本(cost和agentSuggestion成本),因此都在此處添加:

import json
test = pd.DataFrame(d, columns = ['id', 'json', 'name'])
test['cost'] = test['json'].transform(lambda x: json.loads(x)['costs'][0]['cost'])
test['agent_suggestion_cost'] = test['json']\
    .transform(lambda x: json.loads(x)['agentSuggestion']["costs"][0]['cost'])
print(test)

您可以遵循類似的邏輯來解析其他字段,例如每月。 有關更多參考,請參見此處。到處尋找JSON 前綴 (例如,帶有Notepad ++的JSTool )以查看JSON的結構,這將有助於理解其結構。

如果您認為有用,請接受答案。

Pandas提供了一些用於處理json文件的實用程序。 對於您的情況有意義的是pd.read_jsonpd.io.json_normalize 但是他們確實希望輸入的格式與您使用的json格式不同。

orient : string,

Indication of expected JSON string format. Compatible JSON strings can be produced by to_json() with a corresponding orient value. The set of possible orients is:

'split' : dict like {index -> [index], columns -> [columns], data -> [values]}
'records' : list like [{column -> value}, ... , {column -> value}]
'index' : dict like {index -> {column -> value}}
'columns' : dict like {column -> {index -> value}}
'values' : just the values array
The allowed and default values depend on the value of the typ parameter.

when typ == 'series',
allowed orients are {'split','records','index'}
default is 'index'
The Series index must be unique for orient 'index'.
when typ == 'frame',
allowed orients are {'split','records','index', 'columns','values'}
default is 'columns'
The DataFrame index must be unique for orients 'index' and 'columns'.
The DataFrame columns must be unique for orients 'index', 'columns', and 'records'.

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM