解析AWS ATHENA輸出

Question

這是相對較新的Python，來自node.js背景，解析我from get_query_results()獲得的輸出時有很多問題

我已經['ResultSetMetadata']['ColumnInfo']了幾個小時了，我嘗試遍歷['ResultSetMetadata']['ColumnInfo']來獲取列名，但是我不知道如何綁定['ResultSet']['Data']這些代碼，以便代碼知道要應用於每個dataValue名稱。

我知道我需要選擇行標題，然后將關聯的對象添加到這些行中，但是有關如何在python中執行此操作的邏輯使我無法理解。

我可以看到第一列名稱始終與第一個['Data']['VarCharValue']因此我可以按順序獲取所有值，但是如果我遍歷['ResultSet']['Rows'] ，我將第一次迭代隔離為列名，然后彼此填充？

還是有更好的方法來做到這一點？

這是我的json.dumps（ATHENAoutput）

{
  "ResultSet": {
    "Rows": [{
      "Data": [{
        "VarCharValue": "postcode"
      }, {
        "VarCharValue": "CountOf"
      }]
    }, {
      "Data": [{
        "VarCharValue": "1231"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "1166"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "3651"
      }, {
        "VarCharValue": "3"
      }]
    }, {
      "Data": [{
        "VarCharValue": "2171"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "4697"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "4450"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "4469"
      }, {
        "VarCharValue": "1"
      }]
    }],
      "ResultSetMetadata": {
        "ColumnInfo": [{
          "Scale": 0,
          "Name": "postcode",
          "Nullable": "UNKNOWN",
          "TableName": "",
          "Precision": 2147483647,
          "Label": "postcode",
          "CaseSensitive": true,
          "SchemaName": "",
          "Type": "varchar",
          "CatalogName": "hive"
        }, {
          "Scale": 0,
          "Name": "CountOf",
          "Nullable": "UNKNOWN",
          "TableName": "",
          "Precision": 19,
          "Label": "CountOf",
          "CaseSensitive": false,
          "SchemaName": "",
          "Type": "bigint",
          "CatalogName": "hive"
        }]
      }
  },
    "ResponseMetadata": {
      "RetryAttempts": 0,
        "HTTPStatusCode": 200,
          "RequestId": "18190e7c-901c-40b4-b6ef-10a5013b1a70",
            "HTTPHeaders": {
              "date": "Mon, 01 Oct 2018 04:51:14 GMT",
                "x-amzn-requestid": "18190e7c-901c-40b4-b6ef-10a5013b1a70",
                  "content-length": "1464",
                    "content-type": "application/x-amz-json-1.1",
                      "connection": "keep-alive"
            }
    }
}

我想要的結果是一個JSON數組，如下所示：

[{
  "postcode": "2171",
  "CountOf": "2"
}, {
  "postcode": "4697",
  "CountOf": "2"
}, {
  "postcode": "1166",
  "CountOf": "2"
},
 ...
]

Answer 1

>>> def get_var_char_values(d):
...     return [obj['VarCharValue'] for obj in d['Data']]
... 
... 
... header, *rows = input_data['ResultSet']['Rows']
... header = get_var_char_values(header)
... result = [dict(zip(header, get_var_char_values(row))) for row in rows]
>>> import json; print(json.dumps(result, indent=2))
[
  {
    "postcode": "4450",
    "CountOf": "2"
  },
  {
    "postcode": "1231",
    "CountOf": "2"
  },
  {
    "postcode": "4469",
    "CountOf": "1"
  },
  {
    "postcode": "3651",
    "CountOf": "3"
  },
  {
    "postcode": "1166",
    "CountOf": "2"
  },
  {
    "postcode": "4697",
    "CountOf": "2"
  },
  {
    "postcode": "2171",
    "CountOf": "2"
  }
]

解析AWS ATHENA輸出

問題描述

1 個解決方案

解決方案1
1 已采納 2018-10-01 04:55:52

解析AWS ATHENA輸出

問題描述

1 個解決方案

解決方案1 1 已采納 2018-10-01 04:55:52

解決方案1
1 已采納 2018-10-01 04:55:52