简体   繁体   English

AWS Athena:如何处理来自 GetQueryResultsCommand 命令的数据

[英]AWS Athena: how to handle the data from GetQueryResultsCommand command

After retrieving an Athena query result (stored in a CSV file in a S3 bucket) by using the Athena client and the command GetQueryResultsCommand , the data retrieved are structured in the following way:使用 Athena 客户端和命令 GetQueryResultsCommand 检索 Athena 查询结果(存储在 S3 存储桶中的CSV文件中)后,检索到的数据结构如下:

{
   "NextToken": "string",
   "ResultSet": { 
      "ResultSetMetadata": { 
         "ColumnInfo": [ 
            { 
               "CaseSensitive": boolean,
               "CatalogName": "string",
               "Label": "string",
               "Name": "string",
               "Nullable": "string",
               "Precision": number,
               "Scale": number,
               "SchemaName": "string",
               "TableName": "string",
               "Type": "string"
            }
         ]
      },
      "Rows": [ 
         { 
            Data: [
              { VarCharValue: 'columnName1' },
              { VarCharValue: 'columnName2' },
              { VarCharValue: 'columnName3' },
              { VarCharValue: 'columnName4' },
              { VarCharValue: 'columnName5' },
              { VarCharValue: 'columnName6' }
            ]
          },
          {
            Data: [
              { VarCharValue: 'fieldValue1' },
              { VarCharValue:  123.4 },
              { VarCharValue:  false },
              { VarCharValue:  12 },
              { VarCharValue: 'fieldValue5' },
              { VarCharValue:  231.1 }
            ]
          }
      ]
   },
   "UpdateCount": number
}

where at ColumnInfo there are all the information about the columns present in the CSV (name, type information etc) and in the Row array are present all the information about the row: the columns and the related values splitted in two Data objectsColumnInfo中,存在关于 CSV 中存在的列的所有信息(名称、类型信息等),在Row数组中存在关于行的所有信息:列和相关值拆分为两个数据对象

My question is: is it possible to get the data from QueryResultCommand (or another command) with a better structure where the 2 Data objects are already "merged" so it easier to manage the rows and get their values by column name?我的问题是:是否可以从 QueryResultCommand(或其他命令)获取数据,结构更好,其中 2 个数据对象已经“合并”,因此更容易管理行并按列名获取它们的值?

Or, do I have to handle every single element in the ROW array and create my own object?或者,我是否必须处理 ROW 数组中的每个元素并创建我自己的 object?

Checking the documentation for the Athena SDK,we see that it does not support different formats for the returned data.查看 Athena SDK 的文档,我们发现它不支持返回数据的不同格式。 I have pasted the available parameters below.我在下面粘贴了可用参数。 The only other way I see would be to use the CLI which has an option to return the data in a different format.我看到的唯一其他方法是使用 CLI,它可以选择以不同的格式返回数据。

Parameters:

params (Object) (defaults to: {}) —
QueryExecutionId — (String)
The unique ID of the query execution.

NextToken — (String)
A token generated by the Athena service that specifies where to continue pagination if a previous request was truncated. To obtain the next set of pages, pass in the NextToken from the response object of the previous page call.

MaxResults — (Integer)
The maximum number of results (rows) to return in this request.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM