[英]Problems with flattening nested JSON list to Pandas DataFrame, because of unequal data length
我目前正在嘗試使用以下格式的 JSON 文件:
response = {
"leads": [{
"id": 208827181,
"campaignId": 2595,
"contactId": 2919361,
"contactAttempts": 1,
"contactAttemptsInvalid": 0,
"lastModifiedTime": "2017-03-14T13:37:20Z",
"nextContactTime": "2017-03-15T14:37:20Z",
"created": "2017-03-14T13:16:42Z",
"updated": "2017-03-14T13:37:20Z",
"lastContactedBy": 1271,
"status": "automaticRedial",
"active": True,
"masterData": [{
"id": 2054,
"label": "Firmanavn",
"value": "Firma_1"
},
{
"id": 2055,
"label": "Adresse",
"value": "Gadenavn_1"
},
{
"id": 2056,
"label": "Postnr.",
"value": "2000"
},
{
"id": 2057,
"label": "Bydel",
"value": "Frederiksberg"
},
{
"id": 2058,
"label": "Telefonnummer",
"value": "25252525"
}
]
}]
}
masterData 采用嵌套列表格式,但長度也不同。 基本上,每一行/條目都可以分配不同的列。 我希望為每個條目保留一個或多個特定的列。 但是,對於我當前的索引,由於嵌套列表的長度不同,我的索引會中斷。 這是我的代碼:
leads = json_normalize(response['leads'])
df = pd.concat([leads.drop('masterData', 1),
pd.DataFrame(list(pd.DataFrame(list(leads['masterData']))[4]))
.drop(['id', 'label'], 1)
.rename(columns={"value": "tlf"})], axis=1)
所需的輸出是:
active campaignId contactAttempts contactAttemptsInvalid contactId created id lastContactedBy lastModifiedTime nextContactTime resultData status updated tlf
0 True 2595 1 0 2919361 2017-03-14T13:16:42Z 208827181 1271.0 2017-03-14T13:37:20Z 2017-03-15T14:37:20Z [] automaticRedial 2017-03-14T13:37:20Z 37373737
1 True 2595 2 0 2919359 2017-03-14T13:16:42Z 208827179 1271.0 2017-03-14T13:33:30Z 2017-03-15T14:33:30Z [] privateRedial 2017-03-14T13:33:30Z 55555555
2 True 2595 1 0 2919360 2017-03-14T13:16:42Z 208827180 1271.0 2017-03-14T13:36:06Z None [] success 2017-03-14T13:36:06Z 22222222
3 True 2595 1 0 2919362 2017-03-14T13:16:42Z 208827182 1271.0 2017-03-14T13:56:39Z None [] success 2017-03-14T13:56:39Z 34343434
其中“tlf”是“masterData”中添加的列。
僅使用json_normalize
並在列表中指定列名稱:
L = ['active', 'campaignId', 'contactAttempts', 'contactAttemptsInvalid',
'contactId', 'created', 'id', 'lastContactedBy', 'lastModifiedTime',
'nextContactTime', 'status', 'updated']
df = json_normalize(response['leads'], 'masterData', L, record_prefix='masterData.')
print (df)
masterData.id masterData.label masterData.value active campaignId \
0 2054 Firmanavn Firma_1 True 2595
1 2055 Adresse Gadenavn_1 True 2595
2 2056 Postnr. 2000 True 2595
3 2057 Bydel Frederiksberg True 2595
4 2058 Telefonnummer 25252525 True 2595
contactAttempts contactAttemptsInvalid contactId created \
0 1 0 2919361 2017-03-14T13:16:42Z
1 1 0 2919361 2017-03-14T13:16:42Z
2 1 0 2919361 2017-03-14T13:16:42Z
3 1 0 2919361 2017-03-14T13:16:42Z
4 1 0 2919361 2017-03-14T13:16:42Z
id lastContactedBy lastModifiedTime nextContactTime \
0 208827181 1271 2017-03-14T13:37:20Z 2017-03-15T14:37:20Z
1 208827181 1271 2017-03-14T13:37:20Z 2017-03-15T14:37:20Z
2 208827181 1271 2017-03-14T13:37:20Z 2017-03-15T14:37:20Z
3 208827181 1271 2017-03-14T13:37:20Z 2017-03-15T14:37:20Z
4 208827181 1271 2017-03-14T13:37:20Z 2017-03-15T14:37:20Z
status updated
0 automaticRedial 2017-03-14T13:37:20Z
1 automaticRedial 2017-03-14T13:37:20Z
2 automaticRedial 2017-03-14T13:37:20Z
3 automaticRedial 2017-03-14T13:37:20Z
4 automaticRedial 2017-03-14T13:37:20Z
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.