简体   繁体   English

aws Redshift 中的意外行为

[英]Unexpected behaviour in aws Redshift

I have observed some unexpected behaviour in AWS Redshift.我在 AWS Redshift 中观察到一些意外行为。 Below are the examples that illustrate it.以下是说明它的示例。

This query:这个查询:

WITH var_table AS (
SELECT
    8 AS x
)
SELECT * FROM real_table
WHERE real_table_col_1 = (select x from var_table)
AND real_table_col_2 = (select x from var_table)

fails with:失败:

ERROR: Assert
  Detail: 
  -----------------------------------------------
  error:  Assert
  code:      1000
  context:   query->a_last_plan()->m_locus == LocusXNode - 
  query:     48508061
  location:  xen_execute.cpp:8916
  process:   padbmaster [pid=2659]
  -----------------------------------------------

This query works normally:此查询正常工作:

create temporary table var_table AS (
SELECT
    8 AS x
);

SELECT * FROM real_table
WHERE real_table_col_1 = (select x from var_table)
AND real_table_col_2 = (select x from var_table)

And also this query works normally:而且此查询也正常工作:

WITH var_table AS (
SELECT
    8 AS x
)
SELECT * FROM real_table
WHERE real_table_col_1 = (select x from var_table)

Only difference between this one and the first one is that I extract x only once.这个和第一个之间的唯一区别是我只提取x一次。

Is there any explanation for this and I just don't understand some nitty-gritties of Redshift or is it a legitimate bug?对此有任何解释吗?我只是不了解 Redshift 的一些细节,或者它是一个合法的错误吗?

EDIT:编辑:

The query format above is a simplified version which I thought was the minimal reproducible example.上面的查询格式是我认为是最小可重现示例的简化版本。 I have since found out that this issue doesn't happen in some simpler queries if I use the same pattern.从那以后,我发现如果我使用相同的模式,这个问题不会发生在一些更简单的查询中。

I will keep trying to find the minimal reproducible example for some time, but otherwise I will just keep the create temporary table for now.我将在一段时间内继续尝试找到最小的可重现示例,否则我将暂时保留create temporary table

Assert errors are due to some part of the query plan not making sense when the query is executed.断言错误是由于执行查询时查询计划的某些部分没有意义。 Is it a bug?这是一个错误吗? Possibly, but likely should be handled more gracefully.可能,但可能应该更优雅地处理。

I suspect that the error is arising due to the comparison of a value to a column with '=' and doing this multiple times and the table reference being a CTE (no meta data).我怀疑错误是由于将值与带有“=”的列进行比较并多次执行此操作并且表引用是 CTE(无元数据)而引起的。 The query planner is setting up for the general case for this type of comparison which could be mapped to an IN clause or a JOIN.查询规划器正在为这种类型的比较的一般情况进行设置,它可以映射到 IN 子句或 JOIN。 As suspect it is trying to map to a pair of JOINs and things are going sideways.正如所怀疑的那样,它正在尝试将 map 连接到一对 JOIN 并且事情正在横向发展。

A few things to try to see if it is such an issue:尝试查看是否是这样的问题的几件事:

Remove the unneeded second table scan (unnecessarily rescanning tables is not a best practice):删除不需要的第二个表扫描(不必要的重新扫描表不是最佳实践):

WITH var_table AS (
SELECT
    8 AS x
)
SELECT * FROM real_table
WHERE real_table_col_1 = (select x from var_table)
AND real_table_col_2 = real_table_col_1;

Tell the query planner that you only want 1 row from this table:告诉查询规划器您只需要此表中的 1 行:

WITH var_table AS (
SELECT
    8 AS x
)
SELECT * FROM real_table
WHERE real_table_col_1 = (select top 1 x from var_table)
AND real_table_col_2 = (select top 1 x from var_table)

Move to an explicit JOIN syntax:移至显式 JOIN 语法:

WITH var_table AS (
SELECT
    8 AS x
)
SELECT * FROM real_table rt
JOIN var_table vt
ON rt.real_table_col_1 = vt.x 
AND rt.real_table_col_2 = vt.x;

See if you can't recreate the issue by explicitly using redundant joins:查看是否无法通过显式使用冗余连接来重现问题:

WITH var_table AS (
SELECT
    8 AS x
)
SELECT * FROM real_table rt
JOIN vart_table vt1 ON rt.real_table_col_1 = vt1.x
JOIN vart_table vt2 ON rt.real_table_col_2 = vt2.x;

I'm not confident that this last query will fail but I expect it is similar to what the query planner is trying to do with your original query.我不确定最后一个查询是否会失败,但我希望它类似于查询规划器尝试对您的原始查询执行的操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM