简体   繁体   English

Snowflake 查询优化器是否尊重 CTE?

[英]Does the Snowflake Query Optimiser respect CTEs?

If I write a SQL statement that includes CTEs, will the query optimiser always retain those CTEs as discrete statements to be optimised individually or can it merge those CTEs with other parts of the overall SQL if it calculates the resulting SQL will be quicker to execute? If I write a SQL statement that includes CTEs, will the query optimiser always retain those CTEs as discrete statements to be optimised individually or can it merge those CTEs with other parts of the overall SQL if it calculates the resulting SQL will be quicker to execute?

This question was triggered by a question another user asked.这个问题是由另一个用户提出的问题触发的。 They were using a sequence generator in a CTE;他们在 CTE 中使用序列生成器; when the CTE SQL was run in isolation it always produced 12 consecutive numbers, as expected.当 CTE SQL 单独运行时,它总是产生 12 个连续的数字,正如预期的那样。 However, when run in the CTE as part of a much larger SQL statement it was missing numbers ie it wasn't producing consecutive values.但是,当在 CTE 中作为更大的 SQL 语句的一部分运行时,它缺少数字,即它没有产生连续的值。

This is a known issue/behaviour with large datasets but as there were only 12 values it shouldn't have been a problem - but the fact that it was suggests that the CTE was not being run as written, with the 12 record resultset then joined to the other tables, but instead the query optimiser had re-written the overall query and had merged the CTE logic with other parts of the SQL statement and so was prodcuing a much larger dataset.这是大型数据集的一个已知问题/行为,但由于只有 12 个值,它不应该成为问题 - 但事实表明 CTE 没有按书面方式运行,然后加入了 12 个记录结果集到其他表,但是查询优化器重写了整个查询,并将 CTE 逻辑与 SQL 语句的其他部分合并,因此产生了更大的数据集。

Snowflake does not provide a lot of explanation of how it optimizes queries. Snowflake 没有提供很多关于它如何优化查询的解释。

I can say that in general, there are two ways to handle CTEs:我可以说,一般来说,有两种处理 CTE 的方法:

  • Materialize the CTE so it is run once and then the materialized version is read.物化 CTE,使其运行一次,然后读取物化版本。
  • Incorporate the CTE logic into the the rest of the query and optimize as part of the query.将 CTE 逻辑合并到查询的 rest 中,并作为查询的一部分进行优化。

I would actually expect Snowflake to do both , choosing the better execution plan, because it is a modern database that has learned from decades of optimization experience.我实际上希望 Snowflake两者都做,选择更好的执行计划,因为它是一个从数十年的优化经验中学习的现代数据库。 Either method might be better under some circumstances.在某些情况下,任何一种方法都可能更好。

All that said, if the data returned by the code changes as you describe it, then there is a bug.综上所述,如果代码返回的数据按照您的描述发生了变化,那么就会出现错误。 The point of optimization is not to change the semantics (meaning) of the code.优化的重点不是改变代码的语义(意义)。 The point is to return the same results but using different underlying algorithms.关键是返回相同的结果,但使用不同的底层算法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM