简体   繁体   English

解析AWS ATHENA输出

[英]Parsing AWS ATHENA outputs

Relatively new to Python here, coming from a node.js background, having quite a few issues parsing the output I get from get_query_results() 这是相对较新的Python,来自node.js背景,解析我from get_query_results()获得的输出时有很多问题

Documentation Link 文档链接

I have been at this for some hours, i have tried iterating through the ['ResultSetMetadata']['ColumnInfo'] to grab the column names, but i don't know how to tie the ['ResultSet']['Data'] to these items so the code knows which name to apply to each dataValue . 我已经['ResultSetMetadata']['ColumnInfo']了几个小时了,我尝试遍历['ResultSetMetadata']['ColumnInfo']来获取列名,但是我不知道如何绑定['ResultSet']['Data']这些代码,以便代码知道要应用于每个dataValue名称。

I know i need to select the row headers then add the associated objects to those rows, but the logic on how to do such a thing in python escapes me. 我知道我需要选择行标题,然后将关联的对象添加到这些行中,但是有关如何在python中执行此操作的逻辑使我无法理解。

I can see that the first column name always lines up with the first ['Data']['VarCharValue'] so I can get all the values in order, but if I loop through ['ResultSet']['Rows'] how do I isolate the first iteration as the column names to then populate with each other row? 我可以看到第一列名称始终与第一个['Data']['VarCharValue']因此我可以按顺序获取所有值,但是如果我遍历['ResultSet']['Rows'] ,我将第一次迭代隔离为列名,然后彼此填充?

Or is there a better way to do this? 还是有更好的方法来做到这一点?

Here is my json.dumps(ATHENAoutput) 这是我的json.dumps(ATHENAoutput)

{
  "ResultSet": {
    "Rows": [{
      "Data": [{
        "VarCharValue": "postcode"
      }, {
        "VarCharValue": "CountOf"
      }]
    }, {
      "Data": [{
        "VarCharValue": "1231"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "1166"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "3651"
      }, {
        "VarCharValue": "3"
      }]
    }, {
      "Data": [{
        "VarCharValue": "2171"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "4697"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "4450"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "4469"
      }, {
        "VarCharValue": "1"
      }]
    }],
      "ResultSetMetadata": {
        "ColumnInfo": [{
          "Scale": 0,
          "Name": "postcode",
          "Nullable": "UNKNOWN",
          "TableName": "",
          "Precision": 2147483647,
          "Label": "postcode",
          "CaseSensitive": true,
          "SchemaName": "",
          "Type": "varchar",
          "CatalogName": "hive"
        }, {
          "Scale": 0,
          "Name": "CountOf",
          "Nullable": "UNKNOWN",
          "TableName": "",
          "Precision": 19,
          "Label": "CountOf",
          "CaseSensitive": false,
          "SchemaName": "",
          "Type": "bigint",
          "CatalogName": "hive"
        }]
      }
  },
    "ResponseMetadata": {
      "RetryAttempts": 0,
        "HTTPStatusCode": 200,
          "RequestId": "18190e7c-901c-40b4-b6ef-10a5013b1a70",
            "HTTPHeaders": {
              "date": "Mon, 01 Oct 2018 04:51:14 GMT",
                "x-amzn-requestid": "18190e7c-901c-40b4-b6ef-10a5013b1a70",
                  "content-length": "1464",
                    "content-type": "application/x-amz-json-1.1",
                      "connection": "keep-alive"
            }
    }
}

My desired Result is a JSON Array like the following: 我想要的结果是一个JSON数组,如下所示:

[{
  "postcode": "2171",
  "CountOf": "2"
}, {
  "postcode": "4697",
  "CountOf": "2"
}, {
  "postcode": "1166",
  "CountOf": "2"
},
 ...
]
>>> def get_var_char_values(d):
...     return [obj['VarCharValue'] for obj in d['Data']]
... 
... 
... header, *rows = input_data['ResultSet']['Rows']
... header = get_var_char_values(header)
... result = [dict(zip(header, get_var_char_values(row))) for row in rows]
>>> import json; print(json.dumps(result, indent=2))
[
  {
    "postcode": "4450",
    "CountOf": "2"
  },
  {
    "postcode": "1231",
    "CountOf": "2"
  },
  {
    "postcode": "4469",
    "CountOf": "1"
  },
  {
    "postcode": "3651",
    "CountOf": "3"
  },
  {
    "postcode": "1166",
    "CountOf": "2"
  },
  {
    "postcode": "4697",
    "CountOf": "2"
  },
  {
    "postcode": "2171",
    "CountOf": "2"
  }
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM