简体   繁体   English

在 Athena 中,如何查询结构体数组中结构体的成员?

[英]In Athena how do I query a member of a struct in an array in a struct?

I am trying to figure out how to query where I am checking the value of usage given the following table creation:我试图找出如何查询我在哪里检查的值usage如下表创建:

CREATE EXTERNAL TABLE IF NOT EXISTS foo.test (
     `id` string,
     `foo` struct< usages:array< struct< usage:string,
     method_id:int,
     start_at:string,
     end_at:string,
     location:array<string> >>> 
) PARTITIONED BY (
         timestamp date 
) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
         'serialization.format' = '1' ) LOCATION 's3://foo.bar/' TBLPROPERTIES ('has_encrypted_data'='false');

I would like to have a query like:我想有一个查询,如:

SELECT * FROM "foo"."test" WHERE foo.usages.usage is null;

When I do that I get:当我这样做时,我得到:

SYNTAX_ERROR: line 1:53: Expression "foo"."usages" is not of type ROW SYNTAX_ERROR:第 1:53 行:表达式“foo”.“usages”不是 ROW 类型

If I make my query where I directly index the array as seen in the following it works.如果我在直接索引数组的位置进行查询,如下所示。

SELECT * FROM "foo"."test" WHERE foo.usages[1].usage is null;

My overall goal though is to query across all items in the usages array and find any row where at least one item in the usages array has a member usage that is null.我的总体目标虽然是在跨所有项目查询usages阵列,并找到其中的至少一个项目的任何行usages阵列有一个成员usage是空。

Athena is based on Presto. Athena 基于 Presto。 In Presto 318 you can use any_match :在 Presto 318 中,您可以使用any_match

SELECT * FROM "foo"."test"
WHERE any_match(foo.usages, element -> element.usage IS NULL);

I think the function is not available in Athena yet, but you can emulate it using reduce .我认为该功能在 Athena 中尚不可用,但您可以使用reduce来模拟它。

SELECT * FROM "foo"."test"
WHERE reduce(
  foo.usages, -- array to reducing
  false, -- initial state
  (state, element) -> state OR element.usage IS NULL, -- combining function
  state -> state); -- output function (identity in this case)

You can achieve this by unnesting the array into rows and then check those for null values.您可以通过将数组取消嵌套到行中,然后检查这些行中的null值来实现这一点。 This will result in one row per null -value entry.这将导致每个null值条目一行。

select * from test
CROSS JOIN UNNEST(foo.usages) AS t(i)
where i.usage is null

So if you only nee the unique set, you must run this through a select distinct.因此,如果您只需要唯一集,则必须通过 select distinct 运行它。

select distinct id from test
CROSS JOIN UNNEST(foo.usages) AS t(i)
where i.usage is null

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM