查询雅典娜时将结构转换为 json

Question

我有一个雅典娜表，我没有创建或管理，但可以查询。 其中一个字段是结构类型。 为了这个例子，我们假设它看起来像这样：

my_field struct<a:string,
                b:string,
                c:struct<d:string,e:string>
                >

现在，我知道如何查询此结构中的特定字段。 但是在我的一个查询中，我需要提取完整的结构。 所以我只使用：

select my_field from my_table

结果看起来像一个字符串：

{a=aaa, b=bbb, c={d=ddd, e=eee}}

我想将结果作为 json 字符串获取：

{"a":"aaa", "b":"bbb","c":{"d":"ddd", "e":"eee"}}

然后这个字符串将由另一个应用程序处理，这就是为什么我需要它的 json 格式。

我怎样才能做到这一点？

编辑：更好的是，有没有办法以扁平化的方式查询结构？ 所以结果看起来像：

a   |   b   |   c.d  |  c.e   |
-------------------------------
aaa |   bbb |   ddd  |  eee   |

Answer 1

您可以使用parent_field.child_field表示法直接引用嵌套字段。 尝试：

SELECT
  my_field,
  my_field.a,
  my_field.b,
  my_field.c.d,
  my_field.c.e
FROM 
  my_table

Answer 2

我们可以通过后处理将结构从 athena 输出转换为对象。 下面的脚本可能有帮助

假设为嵌套对象接收到示例字符串

   {description=Check the Primary key count of TXN_EVENT table in Oracle, datastore_order=1, zone=yellow, aggregation_type=count, updatedcount=0, updatedat=[2021-06-09T02:03:20.243Z]}

可以使用这个 npm 包athena-struct-parser包的帮助来解析它。

Nodejs—— https: //www.npmjs.com/package/athena-struct-parser
Python—— AWS Athena 将结构数组导出到 JSON

示例代码

var parseStruct =require('athena-struct-parser') ;
var str = '{description=Check the Primary key count of TXN_EVENT table in Oracle, datastore_order=1, zone=yellow, aggregation_type=count, updatedcount=0, updatedat=[2021-06-09T02:03:20.243Z]}'
var parseObj = parseStruct(str)
console.log(parseObj);

结果解析输出

{
  description: 'Check the Primary key count of TXN_EVENT table in Oracle',
  datastore_order: '1',
  zone: 'yellow',
  aggregation_type: 'count',
  updatedcount: '0',
  updatedat: [ '2021-06-09T02:03:20.004Z' ]
}

Answer 3

回答了类似的问题： AWS Athena export array of structs to JSON

我使用了一种简单的方法来绕过 struct -> json Athena 限制。 我创建了第二个表，其中 json 列被保存为原始字符串。 使用 presto json 和数组函数，我能够查询数据并将有效的 json 字符串返回给我的程序：

--Array transform functions too
select 
  json_extract_scalar(dd, '$.timestamp') as timestamp,
  transform(cast(json_extract(json_parse(dd), '$.stats') as ARRAY<JSON>), x -> json_extract_scalar(x, '$.time')) as arr_stats_time,
  transform(cast(json_extract(json_parse(dd), '$.stats') as ARRAY<JSON>), x -> json_extract_scalar(x, '$.mean')) as arr_stats_mean,
  transform(cast(json_extract(json_parse(dd), '$.stats') as ARRAY<JSON>), x -> json_extract_scalar(x, '$.var')) as arr_stats_var
from 
(select '{"timestamp":1520640777.666096,"stats":[{"time":15,"mean":45.23,"var":0.31},{"time":19,"mean":17.315,"var":2.612}],"dets":[{"coords":[2.4,1.7,0.3], "header":{"frame":1,"seq":1,"name":"hello"}}],"pos": {"x":5,"y":1.4,"theta":0.04}}' as dd);

我知道查询将需要更长的时间来执行，但有一些方法可以优化。

查询雅典娜时将结构转换为 json

问题描述

3 个解决方案

解决方案1
4 2018-03-04 22:35:14

解决方案2
0 2021-06-09 07:52:11

解决方案3
0 2022-02-16 11:14:24

查询雅典娜时将结构转换为 json

问题描述

3 个解决方案

解决方案1 4 2018-03-04 22:35:14

解决方案2 0 2021-06-09 07:52:11

解决方案3 0 2022-02-16 11:14:24

解决方案1
4 2018-03-04 22:35:14

解决方案2
0 2021-06-09 07:52:11

解决方案3
0 2022-02-16 11:14:24