简体   繁体   English

Group By 中的 AWS Athena ALIAS 未解决

[英]AWS Athena ALIAS in Group By does not get resolved

I have a very basic group by query in Athena where I would like to use an alias.我在 Athena 中有一个非常基本的 group by query,我想在其中使用别名。 One can make the example work by putting the same reference in the group by, but that's not really handy when there's complex column modifications going on and logic needs to be copied in two places.可以通过在 group by 中放置相同的引用来使示例工作,但是当存在复杂的列修改并且需要在两个地方复制逻辑时,这并不是很方便。 Also I did that in the past and now I have a statement that doesn't work by copying over.我过去也这样做过,现在我有一个通过复制不起作用的声明。

Problem:问题:

SELECT 
    substr(accountDescriptor, 5) as account, 
    sum(revenue) as grossRevenue 
FROM sales 
GROUP BY account

This will throw an error:这将引发错误:

alias Column 'account' cannot be resolved别名列“帐户”无法解析

The following works, so it's about the alias handling.以下工作,所以它是关于别名处理。

SELECT 
    substr(accountDescriptor, 5) as account, 
    sum(revenue) as grossRevenue 
FROM sales 
GROUP BY substr(accountDescriptor, 5)

That is because SQL is evaluated in certain order, like table scan, filter, aggregation, projection, sort.那是因为 SQL 是按特定顺序计算的,如表扫描、过滤、聚合、投影、排序。 You tried to use the result of projection as input of aggregation.您尝试使用投影结果作为聚合的输入。 In many cases it could be possible (where projection is trivial, like your case), but it such behaviour is not defined in ANSI SQL (which Presto and so Athena follows).在许多情况下这是可能的(其中投影是微不足道的,就像你的情况一样),但这种行为没有在 ANSI SQL 中定义(Presto 和 Athena 遵循)。

We see that in many cases it is very useful so, support for this might be added in future (extending ANSI SQL).我们看到在许多情况下它非常有用,因此将来可能会添加对此的支持(扩展 ANSI SQL)。

Currently, there are several ways to overcome this:目前,有几种方法可以克服这个问题:

SELECT account, sum(revenue) as grossRevenue 
FROM (SELECT substr(accountDescriptor, 5) as account, revenue FROM sales)
GROUP BY account

or或者

WITH better_sales AS (SELECT substr(accountDescriptor, 5) as account, revenue FROM sales)
SELECT account, sum(revenue) as grossRevenue 
FROM better_sales
GROUP BY account

or或者

SELECT account, sum(revenue) as grossRevenue 
FROM sales
LATERAL JOIN (SELECT substr(accountDescriptor, 5) as account)
GROUP BY account

or或者

SELECT substr(accountDescriptor, 5) as account, sum(revenue) as grossRevenue
FROM sales
GROUP BY 1;

In addition to answers from kokosing and Gordon Linoff , you can use numbers that represent the location of the grouped column name in the SELECT statement.除了来自kokosingGordon Linoff 的答案之外,您还可以使用代表分组列名称在SELECT语句中的位置的数字。 Such approach can also provide you with better performance as described in section 8 of this AWS Blog .这种方法还可以为您提供更好的性能,如本AWS 博客第 8 节所述。 For example:例如:

SELECT
    substr(accountDescriptor, 5) as account,
    sum(revenue) as grossRevenue
FROM sales,
GROUP BY 1

Note: numbering starts from one and not from zero.注意:编号从一开始而不是从零开始。

Here 1 is somewhat aliased to account .这里1account有点别名。 The main obvious downside is that if you change ordering of you columns within SELECT than you would also need to account for that within GROUP BY :主要明显的缺点是,如果您在SELECT更改列的顺序,那么您还需要在GROUP BY

SELECT
    sum(revenue) as grossRevenue,
    substr(accountDescriptor, 5) as account
FROM sales,
GROUP BY 2

Hive does not allow column aliases in the GROUP BY -- just as the SQL standard does not allow them. Hive 不允许在GROUP BY使用列别名——就像 SQL 标准不允许它们一样。 Some databases extend SQL to allow aliases, but this is an extension.一些数据库扩展 SQL 以允许别名,但这是一个扩展。

Just repeat the expression:只需重复表达:

SELECT substr(accountDescriptor, 5) as account, sum(revenue) as grossRevenue
FROM sales
GROUP BY substr(accountDescriptor, 5);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM