[英]Parsing AWS ATHENA outputs
Relatively new to Python here, coming from a node.js background, having quite a few issues parsing the output I get from get_query_results()
这是相对较新的Python,来自node.js背景,解析我
from get_query_results()
获得的输出时有很多问题
I have been at this for some hours, i have tried iterating through the ['ResultSetMetadata']['ColumnInfo']
to grab the column names, but i don't know how to tie the ['ResultSet']['Data']
to these items so the code knows which name to apply to each dataValue
. 我已经
['ResultSetMetadata']['ColumnInfo']
了几个小时了,我尝试遍历['ResultSetMetadata']['ColumnInfo']
来获取列名,但是我不知道如何绑定['ResultSet']['Data']
这些代码,以便代码知道要应用于每个dataValue
名称。
I know i need to select the row headers then add the associated objects to those rows, but the logic on how to do such a thing in python escapes me. 我知道我需要选择行标题,然后将关联的对象添加到这些行中,但是有关如何在python中执行此操作的逻辑使我无法理解。
I can see that the first column name always lines up with the first ['Data']['VarCharValue']
so I can get all the values in order, but if I loop through ['ResultSet']['Rows']
how do I isolate the first iteration as the column names to then populate with each other row? 我可以看到第一列名称始终与第一个
['Data']['VarCharValue']
因此我可以按顺序获取所有值,但是如果我遍历['ResultSet']['Rows']
,我将第一次迭代隔离为列名,然后彼此填充?
Or is there a better way to do this? 还是有更好的方法来做到这一点?
Here is my json.dumps(ATHENAoutput) 这是我的json.dumps(ATHENAoutput)
{
"ResultSet": {
"Rows": [{
"Data": [{
"VarCharValue": "postcode"
}, {
"VarCharValue": "CountOf"
}]
}, {
"Data": [{
"VarCharValue": "1231"
}, {
"VarCharValue": "2"
}]
}, {
"Data": [{
"VarCharValue": "1166"
}, {
"VarCharValue": "2"
}]
}, {
"Data": [{
"VarCharValue": "3651"
}, {
"VarCharValue": "3"
}]
}, {
"Data": [{
"VarCharValue": "2171"
}, {
"VarCharValue": "2"
}]
}, {
"Data": [{
"VarCharValue": "4697"
}, {
"VarCharValue": "2"
}]
}, {
"Data": [{
"VarCharValue": "4450"
}, {
"VarCharValue": "2"
}]
}, {
"Data": [{
"VarCharValue": "4469"
}, {
"VarCharValue": "1"
}]
}],
"ResultSetMetadata": {
"ColumnInfo": [{
"Scale": 0,
"Name": "postcode",
"Nullable": "UNKNOWN",
"TableName": "",
"Precision": 2147483647,
"Label": "postcode",
"CaseSensitive": true,
"SchemaName": "",
"Type": "varchar",
"CatalogName": "hive"
}, {
"Scale": 0,
"Name": "CountOf",
"Nullable": "UNKNOWN",
"TableName": "",
"Precision": 19,
"Label": "CountOf",
"CaseSensitive": false,
"SchemaName": "",
"Type": "bigint",
"CatalogName": "hive"
}]
}
},
"ResponseMetadata": {
"RetryAttempts": 0,
"HTTPStatusCode": 200,
"RequestId": "18190e7c-901c-40b4-b6ef-10a5013b1a70",
"HTTPHeaders": {
"date": "Mon, 01 Oct 2018 04:51:14 GMT",
"x-amzn-requestid": "18190e7c-901c-40b4-b6ef-10a5013b1a70",
"content-length": "1464",
"content-type": "application/x-amz-json-1.1",
"connection": "keep-alive"
}
}
}
My desired Result is a JSON Array like the following: 我想要的结果是一个JSON数组,如下所示:
[{
"postcode": "2171",
"CountOf": "2"
}, {
"postcode": "4697",
"CountOf": "2"
}, {
"postcode": "1166",
"CountOf": "2"
},
...
]
>>> def get_var_char_values(d):
... return [obj['VarCharValue'] for obj in d['Data']]
...
...
... header, *rows = input_data['ResultSet']['Rows']
... header = get_var_char_values(header)
... result = [dict(zip(header, get_var_char_values(row))) for row in rows]
>>> import json; print(json.dumps(result, indent=2))
[
{
"postcode": "4450",
"CountOf": "2"
},
{
"postcode": "1231",
"CountOf": "2"
},
{
"postcode": "4469",
"CountOf": "1"
},
{
"postcode": "3651",
"CountOf": "3"
},
{
"postcode": "1166",
"CountOf": "2"
},
{
"postcode": "4697",
"CountOf": "2"
},
{
"postcode": "2171",
"CountOf": "2"
}
]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.