[英]Parsing AWS ATHENA outputs
這是相對較新的Python,來自node.js背景,解析我from get_query_results()
獲得的輸出時有很多問題
我已經['ResultSetMetadata']['ColumnInfo']
了幾個小時了,我嘗試遍歷['ResultSetMetadata']['ColumnInfo']
來獲取列名,但是我不知道如何綁定['ResultSet']['Data']
這些代碼,以便代碼知道要應用於每個dataValue
名稱。
我知道我需要選擇行標題,然后將關聯的對象添加到這些行中,但是有關如何在python中執行此操作的邏輯使我無法理解。
我可以看到第一列名稱始終與第一個['Data']['VarCharValue']
因此我可以按順序獲取所有值,但是如果我遍歷['ResultSet']['Rows']
,我將第一次迭代隔離為列名,然后彼此填充?
還是有更好的方法來做到這一點?
這是我的json.dumps(ATHENAoutput)
{
"ResultSet": {
"Rows": [{
"Data": [{
"VarCharValue": "postcode"
}, {
"VarCharValue": "CountOf"
}]
}, {
"Data": [{
"VarCharValue": "1231"
}, {
"VarCharValue": "2"
}]
}, {
"Data": [{
"VarCharValue": "1166"
}, {
"VarCharValue": "2"
}]
}, {
"Data": [{
"VarCharValue": "3651"
}, {
"VarCharValue": "3"
}]
}, {
"Data": [{
"VarCharValue": "2171"
}, {
"VarCharValue": "2"
}]
}, {
"Data": [{
"VarCharValue": "4697"
}, {
"VarCharValue": "2"
}]
}, {
"Data": [{
"VarCharValue": "4450"
}, {
"VarCharValue": "2"
}]
}, {
"Data": [{
"VarCharValue": "4469"
}, {
"VarCharValue": "1"
}]
}],
"ResultSetMetadata": {
"ColumnInfo": [{
"Scale": 0,
"Name": "postcode",
"Nullable": "UNKNOWN",
"TableName": "",
"Precision": 2147483647,
"Label": "postcode",
"CaseSensitive": true,
"SchemaName": "",
"Type": "varchar",
"CatalogName": "hive"
}, {
"Scale": 0,
"Name": "CountOf",
"Nullable": "UNKNOWN",
"TableName": "",
"Precision": 19,
"Label": "CountOf",
"CaseSensitive": false,
"SchemaName": "",
"Type": "bigint",
"CatalogName": "hive"
}]
}
},
"ResponseMetadata": {
"RetryAttempts": 0,
"HTTPStatusCode": 200,
"RequestId": "18190e7c-901c-40b4-b6ef-10a5013b1a70",
"HTTPHeaders": {
"date": "Mon, 01 Oct 2018 04:51:14 GMT",
"x-amzn-requestid": "18190e7c-901c-40b4-b6ef-10a5013b1a70",
"content-length": "1464",
"content-type": "application/x-amz-json-1.1",
"connection": "keep-alive"
}
}
}
我想要的結果是一個JSON數組,如下所示:
[{
"postcode": "2171",
"CountOf": "2"
}, {
"postcode": "4697",
"CountOf": "2"
}, {
"postcode": "1166",
"CountOf": "2"
},
...
]
>>> def get_var_char_values(d):
... return [obj['VarCharValue'] for obj in d['Data']]
...
...
... header, *rows = input_data['ResultSet']['Rows']
... header = get_var_char_values(header)
... result = [dict(zip(header, get_var_char_values(row))) for row in rows]
>>> import json; print(json.dumps(result, indent=2))
[
{
"postcode": "4450",
"CountOf": "2"
},
{
"postcode": "1231",
"CountOf": "2"
},
{
"postcode": "4469",
"CountOf": "1"
},
{
"postcode": "3651",
"CountOf": "3"
},
{
"postcode": "1166",
"CountOf": "2"
},
{
"postcode": "4697",
"CountOf": "2"
},
{
"postcode": "2171",
"CountOf": "2"
}
]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.