简体   繁体   English

如何证明在SQL中使用子选择查询会破坏服务器的性能

[英]How to Prove that using subselect queries in SQL is killing performance of server

One of my jobs it to maintain our database, usually we have troubles with lack of performance while getting reports and working whit that base. 我的工作之一就是维护我们的数据库,通常我们在获取报告和工作时会遇到性能不足的麻烦。
When I start looking at queries which our ERP sending to database I see a lot of totally needlessly subselect queries inside main queries. 当我开始查看我们的ERP发送到数据库的查询时,我在主查询中看到了很多完全不必要的子查询。
As I am not member of developers which is creator of program we using, they do not like much when I criticize they code and job. 因为我不是我们使用的程序创建者的开发人员,所以当我批评他们的代码和工作时,他们并不喜欢。 Let say they do not taking my review as serious statements. 让我们说他们不把我的评论作为严肃的陈述。 So I asking you few questions about subselect in SQL 所以我问你几个关于SQL中subselect的问题

Does subselect is taking a lot of more time then left outer joins? subselect是否需要花费更多时间才能留下外连接?
Does exists any blog, article or anything where I subselect is recommended not to use ? 是否存在任何博客,文章或我推荐不使用的任何内容?
How I can prove that if we avoid subselesct in query that query is going to be faster ? 我怎么能证明,如果我们在查询中避免使用subselesct查询会更快?

Our database server is MSSQL2005 我们的数据库服务器是MSSQL2005

"Show, Don't Tell" - Examine and compare the query plans of the queries identified using SQL Profiler. “显示,不要告诉” - 检查并比较使用SQL事件探查器识别的查询的查询计划。 Particularly look out for table scans and bookmark lookups (you want to see index seeks as often as possible). 特别注意表扫描和书签查找(您希望尽可能多地查看索引查找)。 The 'goodness of fit' of query plans depends on up-to-date statistics, what indexes are defined, the holistic query workload. 查询计划的“适合度”取决于最新统计信息,定义了哪些索引,整体查询工作负载。

Run the queries in SQL Server Management Studio (SSMS) and turn on Query->Include Actual Execution Plan (CTRL+M) 在SQL Server Management Studio(SSMS)中运行查询并启用Query-> Include Actual Execution Plan(CTRL + M)

Think yourself lucky they're only subselects (which in some cases the optimiser will produce equivalent 'join plans') and not correlated sub-queries! 认为自己很幸运,他们只是次选(在某些情况下优化器会生成相应的'连接计划')而不是相关的子查询!

Identify a query that is performing a high number of logical reads, re-write it using your preferred technique and then show how few logicals reads it does by comparison. 确定执行大量逻辑读取的查询,使用首选技术重新编写它,然后通过比较显示逻辑读取的次数。

Here's a tip. 这是一个提示。 To get the total number of logical reads performed, wrap a query in question with: 要获取执行的逻辑读取总数,请使用以下命令包装有问题的查询:

SET STATISTICS IO ON
GO

-- Run your query here

SET STATISTICS IO OFF
GO

Run your query, and switch to the messages tab in the results pane. 运行查询,然后切换到结果窗格中的“消息”选项卡。

If you are interested in learning more, there is no better book than SQL Server 2008 Query Performance Tuning Distilled , which covers the essential techniques for monitoring, interpreting and fixing performance issues. 如果您有兴趣了解更多信息,那么没有比SQL Server 2008查询性能调优蒸馏更好的书籍,它涵盖了监控,解释和修复性能问题的基本技术。

One thing you can do is to load SQL Profiler and show them the cost (in terms of CPU cycles, reads and writes) of the sub-queries. 您可以做的一件事是加载SQL事件探查器并向他们展示子查询的成本(就CPU周期,读取和写入而言)。 It's tough to argue with cold, hard statistics. 冷酷,严谨的统计数据很难说。

I would also check the query plan for these queries to make sure appropriate indexes are being used, and table/index scans are being held to a minimum. 我还会检查这些查询的查询计划,以确保使用适当的索引,并将表/索引扫描保持在最低限度。

In general, I wouldn't say sub-queries are bad, if used correctly and the appropriate indexes are in place. 一般来说,我不会说子查询是坏的,如果使用正确并且适当的索引到位。

I'm not very familiar with MSSQL, as we are using postrgesql in most of our applications. 我对MSSQL不是很熟悉,因为我们在大多数应用程序中都使用了postrgesql。 However there should exist something like "EXPLAIN" which shows you the execution plan for the query. 但是应该存在类似“EXPLAIN”的内容,它会显示查询的执行计划。 There you should be able to see the various steps that a query will produce in order to retrieve the needed data. 在那里,您应该能够看到查询将生成的各个步骤,以便检索所需的数据。

If you see there a lot of table scans or loop join without any index usage it is definitely a hint for a slow query execution. 如果你看到很多表扫描或循环连接没有任何索引使用它肯定是一个缓慢的查询执行的提示。 With such a tool you should be able to compare the two queries (one with the join, the other without) 使用这样的工具,您应该能够比较两个查询(一个与连接,另一个没有)

It is difficult to state which is the better ways, because it really highly depends on the indexes the optimizer can take in the various cases and depending on the DBMS the optimizer may be able to implicitly rewrite a subquery-query into a join-query and execute it. 很难说明哪种方法更好,因为它实际上高度依赖于优化器在各种情况下可以采用的索引,并且根据DBMS,优化器可能能够隐式地将子查询 - 查询重写为连接查询和执行它。

If you really want to show which is better you have to execute both and measure the time, cpu-usage and so on. 如果你真的想要显示哪个更好,你必须执行两者并测量时间,cpu使用等。

UPDATE: Probably it is this one for MSSQL --> QueryPlan 更新:可能是MSSQL的这个 - > QueryPlan

From my own experience both methods can be valid, as for example an EXISTS subselect can avoid a lot of treatment with an early break. 根据我自己的经验,这两种方法都是有效的,例如EXISTS子选择可以避免早期休息时的大量处理。

Buts most of the time queries with a lot of subselect are done by devs which do not really understand SQL and use their classic-procedural-programmer way of thinking on queries. 但是大多数情况下,带有大量子选择的查询都是由开发人员完成的,这些开发人员并不真正理解SQL并且使用他们的经典 - 程序 - 程序员的方式来思考查询。 Then they don't even think about joins, and makes some awfull queries. 然后他们甚至不考虑连接,并做出一些可怕的查询。 So I prefer joins, and I always check subqueries. 所以我更喜欢连接,我总是检查子查询。 To be completly honnest I track slow queries, and my first try on slow queries containing subselects is trying to do joins. 为了完全正确我跟踪慢查询,我第一次尝试包含子选择的慢查询试图进行连接。 Works a lot of time. 工作了很多时间。

But there's no rules which can establish that subselect are bad or slower than joins, it's just that bad sql programmer often do subselects :-) 但是没有规则可以确定subselect是坏的还是慢于连接,只是那个坏的sql程序员经常做sub子选择:-)

Does subselect is taking a lot of more time then left outer joins? subselect是否需要花费更多时间才能留下外连接?

This depends on the subselect and left outer joins. 这取决于子选择和左外连接。

Generally, this construct: 通常,这个结构:

SELECT  *
FROM    mytable
WHERE   mycol NOT IN
        (
        SELECT  othercol
        FROM    othertable
        )

is more efficient than this: 比这有效:

SELECT  m.*
FROM    mytable m
LEFT JOIN
        othertable o
ON      o.othercol = m.mycol
WHERE   o.othercol IS NULL

See here: 看这里:

Does exists any blog, article or anything where subselect is recommended not to use ? 是否存在任何建议不使用subselect的博客,文章或任何内容?

I would steer clear of the blogs which blindly recommend to avoid subselects. 我会避开盲目建议避免子选择的博客。

They are implemented for a reason and, believe it or not, the developers have put some effort into optimizing them. 它们的实现是出于某种原因,不管你信不信,开发人员已经付出了一些努力来优化它们。

How I can prove that if we avoid subselesct in query that query is going to be faster ? 我怎么能证明,如果我们在查询中避免使用subselesct查询会更快?

Write a query without the subselects which runs faster. 编写一个没有运行速度更快的子选择的查询。

If you post your query here we possibly will be able to improve it. 如果您在此处发布查询,我们可能会对其进行改进。 However, a version with the subselects may turn out to be faster. 但是,具有子选择的版本可能会变得更快。

Try rewriting some of the queries to elminate the sub-select and compare runtimes. 尝试重写一些查询以消除子选择并比较运行时。

Share and enjoy. 分享和享受。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM