简体   繁体   English

使用SQL变量和子查询的利弊是什么?

[英]What are the pros/cons of using SQL variables versus subqueries?

I'm wondering there is a difference between SQL variables and subqueries. 我想知道SQL变量和子查询之间是否有区别。 Whether one uses more processing power, or one is quicker, or even if one merely is more readable. 一个人使用更多的处理能力,还是一个更快,或者即使一个仅仅是更易读。

For (a very basic) example, I like to use variables to hold polygon and transformations in PostGIS: 对于(一个非常基本的)示例,我喜欢使用变量来保存PostGIS中的多边形和变换:

WITH region_polygon AS (
    SELECT ST_Transform(wkb_geometry, %(fishnet_srid)d) geom
    FROM regions
    LIMIT 1
), raster_pixels AS (
    SELECT (ST_PixelAsPolygons(rast)).*
    FROM test_regions_raster
    LIMIT 1
)
SELECT x, y
FROM raster_pixels a, region_polygon b
WHERE ST_Within(a.geom, b.geom)

But would it be better in any way to use subqueries? 但是以任何方式使用子查询会更好吗?

SELECT x, y
FROM (
    SELECT ST_Transform(wkb_geometry, %(fishnet_srid)d) geom
    FROM regions
    LIMIT 1
) a, (
    SELECT (ST_PixelAsPolygons(rast)).*
    FROM test_regions_raster
    LIMIT 1
) b
WHERE ST_Within(a.geom, b.geom)

Note that I'm using PostgreSQL. 请注意,我正在使用PostgreSQL。

There's an important syntactic advantage of common table expressions over derived tables when it comes to reuse. 与重用表相比,公用表表达式比派生表有一个重要的语法优势。 Consider the following, equivalent examples using self-joins: 考虑以下使用自联接的等效示例:

Using common table expressions 使用通用表表达式

WITH a(v) AS (SELECT 1 UNION SELECT 2)
SELECT *
FROM a AS x, a AS y

Using derived tables 使用派生表

SELECT *
FROM (SELECT 1 UNION SELECT 2) x(v),
     (SELECT 1 UNION SELECT 2) y(v)

As you can see, using common table expressions, the view (SELECT 1 UNION SELECT 2) can be reused multiple times in your query. 如您所见,使用公用表表达式,视图(SELECT 1 UNION SELECT 2)可以在查询中重复使用多次。 With derived tables, you will have to repeat your view declaration. 对于派生表,您将不得不重复您的视图声明。 In my example, this is still OK. 在我的示例中,这仍然可以。 In your own example, this starts getting a bit more hairy. 在您自己的示例中,这开始变得有些毛茸茸了。

It's all about scope 这与范围有关

Views in SQL are all about scoping. SQL中的视图都是关于作用域的。 There are essentially four levels of declaring views: 声明视图实质上有四个级别:

  • As derived tables. 作为派生表。 They can be consumed exactly once. 它们只能被食用一次。
  • As common table expressions. 作为常用表表达式。 They can be consumed several times, but only in one query. 它们可以被使用多次,但只能在一个查询中使用。
  • As views. 作为意见。 They can be consumed several times in several queries. 它们可以在几个查询中被多次使用。
  • As materialized views. 作为物化视图。 Same as views, but the data is pre-calculated. 与视图相同,但数据已预先计算。

Some databases (in particular PostgreSQL) also know table-valued functions. 一些数据库(尤其是PostgreSQL)也知道表值函数。 From a mere syntax perspective, they're just like views - parameterised views. 仅从语法角度来看,它们就像视图-参数化视图。

Performance 性能

Note that these thoughts only focus on syntax, not query planning. 请注意,这些想法仅关注语法,而不关注查询计划。 The different approaches may have very different performance implications, depending on the database vendor. 根据数据库供应商的不同,不同的方法可能会对性能产生很大的影响。

Those aren't variables, they're common table expressions ( cte ). 这些不是变量,它们是常见的表表达式( cte )。 In your query above, the execution plans are likely identical, because the optimizer should recognize they are equivalent queries. 在上面的查询中,执行计划可能是相同的,因为优化程序应该识别出它们是等效的查询。 I prefer to use cte 's because I think they're easier to read than subqueries, but that's it. 我更喜欢使用cte ,因为我认为它们比子查询更易于阅读,仅此而已。

Edit: Upon further reading it looks like PostgreSQL does treat common table expressions differently than other databases, you can't update a cte in PostgreSQL, for instance. 编辑:进一步阅读后,似乎PostgreSQL与其他数据库对待公用表表达式的方式有所不同,例如,您无法在PostgreSQL中更新cte I'll leave my answer here because I believe for your query there won't be a difference, but I'm not terribly familiar with PostgreSQL. 我将答案留在这里,因为我相信您的查询不会有什么不同,但是我对PostgreSQL并不十分熟悉。

As pointed out this construct is called Common Table Expression , not a variable. 如前所述,此构造称为“ 公用表表达式” ,而不是变量。

I prefer to use CTE, rather than subquery, because it is way easier to read and write for me, especially when you have several nested CTEs. 我更喜欢使用CTE,而不是子查询,因为它对我来说更容易读写,尤其是当您有多个嵌套的CTE时。

You can write CTE once and refer to it several times in the rest of the query. 您可以编写一次CTE,并在其余查询中多次引用它。 With subquery you'll have to repeat the code several times. 使用子查询,您将不得不重复多次代码。

Important difference of PostgreSQL from other databases (at least from MS SQL Server) is that PostgreSQL evaluates each CTE only once. PostgreSQL与其他数据库(至少与MS SQL Server相比)的重要区别在于PostgreSQL仅对每个CTE进行一次评估。

A useful property of WITH queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. WITH查询的一个有用属性是,即使父查询或同级WITH查询多次引用它们,每次执行父查询也只会对它们进行一次评估。 Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. 因此,可以在WITH查询中放置多个位置所需的昂贵计算,以避免多余的工作。 Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. 另一个可能的应用是防止对副作用进行不必要的多重评估。 However, the other side of this coin is that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary sub-query. 但是,另一方面,与普通子查询相比,优化器将限制从父查询向下推送到WITH查询的能力也较小。 The WITH query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards. 通常,WITH查询将被评估为已写入,而不会抑制父查询之后可能会丢弃的行。 (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.) (但是,如上所述,如果对查询的引用仅需要有限的行数,则评估可能会提前停止。)

MS SQL Server would inline each reference of CTE into the main query and optimize the whole result, but PostgreSQL doesn't. MS SQL Server会将CTE的每个引用内联到主查询中,并优化整个结果,而PostgreSQL则没有。 In some sense PostgreSQL is more flexible here. 从某种意义上说,PostgreSQL在这里更灵活。 If you want the subquery to be evaluated only once, put it in CTE. 如果只希望对子查询进行一次评估,请将其放在CTE中。 If you don't want, put it in subquery and repeat the code. 如果不想,将其放在子查询中并重复代码。 In SQL Server you'd have to use temporary table explicitly. 在SQL Server中,您必须显式使用临时表。

Your example in the question is too simple and most likely both variants are equivalent - check the execution plan. 您在问题中的示例太简单,很可能两个变体都等效-检查执行计划。


Official docs mention it, as I quoted above, but Nick Barnes gave a link to a good article explaining it in more details and I thought it is worth putting it in an answer, rather that comment. 就像我在上面引述的那样,官方文档提到了它,但是尼克·巴恩斯(Nick Barnes)给出了一篇很好的文章的链接,其中对它进行了更详细的解释 ,我认为值得将其作为答案,而不是发表评论。

When optimising queries in PostgreSQL (true at least in 9.4 and older), it's worth keeping in mind that – unlike newer versions of various other databases – PostgreSQL will always materialise a CTE term in a query. 在PostgreSQL中优化查询时(至少在9.4及更早版本中适用),需要牢记的是-与其他各种数据库的较新版本不同,PostgreSQL将始终在查询中实现CTE术语。

This can have quite surprising effects for those used to working with DBs like MS SQL : 对于那些曾经使用过MS SQL的DB来说,这可能会产生令人惊讶的效果

  • A query that should touch a small amount of data instead reads a whole table and possibly spills it to a tempfile; 相反,应该处理少量数据的查询将读取整个表,并可能将其溢出到临时文件中。
  • and You cannot UPDATE or DELETE FROM a CTE term, because it's more like a read-only temp table rather than a dynamic view. 并且您不能从CTE术语进行UPDATE或DELETE,因为它更像是只读的临时表而不是动态视图。

So, there is no definite answer whether CTE is better than subquery in PostgreSQL. 因此,没有明确的答案CTE是否比PostgreSQL中的子查询好。 In some cases it can be faster, in some cases it can be slower. 在某些情况下,速度可能会更快,在某些情况下,速度可能会更慢。 But, IMHO, in most cases CTE is easier to write, read and maintain. 但是,恕我直言,在大多数情况下CTE更容易编写,阅读和维护。

And, obviously, there is a case when you have no other option, but to use so-called recursive CTE (recursive queries are typically used to deal with hierarchical or tree-structured data). 而且,很明显,在某些情况下,您别无选择,只能使用所谓的递归CTE(递归查询通常用于处理分层数据或树状数据)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM