[英]Unexpected behaviour in aws Redshift
I have observed some unexpected behaviour in AWS Redshift.我在 AWS Redshift 中观察到一些意外行为。 Below are the examples that illustrate it.
以下是说明它的示例。
This query:这个查询:
WITH var_table AS (
SELECT
8 AS x
)
SELECT * FROM real_table
WHERE real_table_col_1 = (select x from var_table)
AND real_table_col_2 = (select x from var_table)
fails with:失败:
ERROR: Assert
Detail:
-----------------------------------------------
error: Assert
code: 1000
context: query->a_last_plan()->m_locus == LocusXNode -
query: 48508061
location: xen_execute.cpp:8916
process: padbmaster [pid=2659]
-----------------------------------------------
This query works normally:此查询正常工作:
create temporary table var_table AS (
SELECT
8 AS x
);
SELECT * FROM real_table
WHERE real_table_col_1 = (select x from var_table)
AND real_table_col_2 = (select x from var_table)
And also this query works normally:而且此查询也正常工作:
WITH var_table AS (
SELECT
8 AS x
)
SELECT * FROM real_table
WHERE real_table_col_1 = (select x from var_table)
Only difference between this one and the first one is that I extract x
only once.这个和第一个之间的唯一区别是我只提取
x
一次。
Is there any explanation for this and I just don't understand some nitty-gritties of Redshift or is it a legitimate bug?对此有任何解释吗?我只是不了解 Redshift 的一些细节,或者它是一个合法的错误吗?
EDIT:编辑:
The query format above is a simplified version which I thought was the minimal reproducible example.上面的查询格式是我认为是最小可重现示例的简化版本。 I have since found out that this issue doesn't happen in some simpler queries if I use the same pattern.
从那以后,我发现如果我使用相同的模式,这个问题不会发生在一些更简单的查询中。
I will keep trying to find the minimal reproducible example for some time, but otherwise I will just keep the create temporary table
for now.我将在一段时间内继续尝试找到最小的可重现示例,否则我将暂时保留
create temporary table
。
Assert errors are due to some part of the query plan not making sense when the query is executed.断言错误是由于执行查询时查询计划的某些部分没有意义。 Is it a bug?
这是一个错误吗? Possibly, but likely should be handled more gracefully.
可能,但可能应该更优雅地处理。
I suspect that the error is arising due to the comparison of a value to a column with '=' and doing this multiple times and the table reference being a CTE (no meta data).我怀疑错误是由于将值与带有“=”的列进行比较并多次执行此操作并且表引用是 CTE(无元数据)而引起的。 The query planner is setting up for the general case for this type of comparison which could be mapped to an IN clause or a JOIN.
查询规划器正在为这种类型的比较的一般情况进行设置,它可以映射到 IN 子句或 JOIN。 As suspect it is trying to map to a pair of JOINs and things are going sideways.
正如所怀疑的那样,它正在尝试将 map 连接到一对 JOIN 并且事情正在横向发展。
A few things to try to see if it is such an issue:尝试查看是否是这样的问题的几件事:
Remove the unneeded second table scan (unnecessarily rescanning tables is not a best practice):删除不需要的第二个表扫描(不必要的重新扫描表不是最佳实践):
WITH var_table AS (
SELECT
8 AS x
)
SELECT * FROM real_table
WHERE real_table_col_1 = (select x from var_table)
AND real_table_col_2 = real_table_col_1;
Tell the query planner that you only want 1 row from this table:告诉查询规划器您只需要此表中的 1 行:
WITH var_table AS (
SELECT
8 AS x
)
SELECT * FROM real_table
WHERE real_table_col_1 = (select top 1 x from var_table)
AND real_table_col_2 = (select top 1 x from var_table)
Move to an explicit JOIN syntax:移至显式 JOIN 语法:
WITH var_table AS (
SELECT
8 AS x
)
SELECT * FROM real_table rt
JOIN var_table vt
ON rt.real_table_col_1 = vt.x
AND rt.real_table_col_2 = vt.x;
See if you can't recreate the issue by explicitly using redundant joins:查看是否无法通过显式使用冗余连接来重现问题:
WITH var_table AS (
SELECT
8 AS x
)
SELECT * FROM real_table rt
JOIN vart_table vt1 ON rt.real_table_col_1 = vt1.x
JOIN vart_table vt2 ON rt.real_table_col_2 = vt2.x;
I'm not confident that this last query will fail but I expect it is similar to what the query planner is trying to do with your original query.我不确定最后一个查询是否会失败,但我希望它类似于查询规划器尝试对您的原始查询执行的操作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.