简体   繁体   English

如何提高SQL查询性能

[英]How to improve SQL Query Performance

I have the following DB Structure (simplified): 我具有以下数据库结构(简化):

Payments
----------------------
Id        | int
InvoiceId | int
Active    | bit
Processed | bit


Invoices
----------------------
Id              | int
CustomerOrderId | int


CustomerOrders
------------------------------------
Id                       | int
ApprovalDate             | DateTime
ExternalStoreOrderNumber | nvarchar

Each Customer Order has an Invoice and each Invoice can have multiple Payments. 每个客户订单都有一个发票,每个发票可以有多个付款。 The ExternalStoreOrderNumber is a reference to the order from the external partner store we imported the order from and the ApprovalDate the timestamp when that import happened. ExternalStoreOrderNumber是对我们从中导入订单的外部合作伙伴商店的订单的引用,以及该导入发生时的时间戳的ApprovalDate

Now we have the problem that we had a wrong import an need to change some payments to other invoices (several hundert, so too mach to do by hand) according to the following logic: 现在我们有一个问题,就是我们汇入了错误的商品,需要根据以下逻辑将一些付款更改为其他发票(几笔钱,所以手工做得太快了):
Search the Invoice of the Order which has the same external number as the current one but starts with 0 instead of the current digit. 搜索订单发票,该发票的外部编号与当前编号相同,但以0而不是当前数字开头。

To do that I created the following query: 为此,我创建了以下查询:

UPDATE DB.dbo.Payments 
    SET InvoiceId=
        (SELECT TOP 1 I.Id FROM DB.dbo.Invoices AS I
            WHERE I.CustomerOrderId=
                (SELECT TOP 1 O.Id FROM DB.dbo.CustomerOrders AS O 
                    WHERE O.ExternalOrderNumber='0'+SUBSTRING(
                      (SELECT TOP 1 OO.ExternalOrderNumber FROM DB.dbo.CustomerOrders AS OO
                        WHERE OO.Id=I.CustomerOrderId), 1, 10000)))
    WHERE Id IN (
        SELECT P.Id
          FROM DB.dbo.Payments AS P
            JOIN DB.dbo.Invoices AS I ON I.Id=P.InvoiceId
            JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
         WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00'

Now I started that query on a test system using the live data (~250.000 rows in each table) and it is now running since 16h - did I do something completely wrong in the query or is there a way to speed it up a little? 现在,我使用实时数据(每个表中的〜250.000行)在测试系统上启动了该查询,并且该查询自16h开始运行-我在查询中做错了什么吗?或者是否可以加快查询速度?
It is not required to be really fast, as it is a one time task, but several hours seems long to me and as I want to learn for the (hopefully not happening) next time I would like some feedback how to improve... 它不是必须非常快,因为它是一项一次性的任务,但是对我来说似乎要花几个小时,而且由于我想学习下次(希望不会发生),我想获得一些改进的反馈...

You might as well kill the query. 您不妨取消该查询。 Your update subquery is completely un-correlated to the table being updated. 您的更新子查询与要更新的表完全不相关。 From the looks of it, when it completes, EVERY SINGLE dbo.payments record will have the same value. 从外观上看,完成后,每个dbo.payments记录都将具有相同的值。

To break down your query, you might find that the subquery runs fine on its own. 要分解查询,您可能会发现子查询本身运行良好。

SELECT TOP 1 I.Id FROM DB.dbo.Invoices AS I
            WHERE I.CustomerOrderId=
                (SELECT TOP 1 O.Id FROM DB.dbo.CustomerOrders AS O 
                    WHERE O.ExternalOrderNumber='0'+SUBSTRING(
                      (SELECT TOP 1 OO.ExternalOrderNumber FROM DB.dbo.CustomerOrders AS OO
                        WHERE OO.Id=I.CustomerOrderId), 1, 10000))

That is always a BIG worry. 总是大担心。

The next thing is that it is running this row-by-row for every record in the table. 接下来的事情是它正在为表中的每个记录逐行运行。

You are also double-dipping into payments, by selecting from where ... the id is from a join involving itself. 您还可以通过选择...的ID来自涉及其自身的联接来双倍地支付款项。 You can reference a table for update in the JOIN clause using this pattern: 您可以使用以下模式在JOIN子句中引用表进行更新:

UPDATE P
....
  FROM DB.dbo.Payments AS P
    JOIN DB.dbo.Invoices AS I ON I.Id=P.InvoiceId
    JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
 WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00'

Moving on, another mistake is to use TOP without ORDER BY. 继续,另一个错误是在没有ORDER BY的情况下使用TOP。 That's asking for random results. 这是在要求随机结果。 If you know there's only one result, you wouldn't even need TOP. 如果您知道只有一个结果,那么您甚至不需要TOP。 In this case, maybe you're ok with randomly choosing one from many possible matches. 在这种情况下,也许您可​​以从许多可能的匹配项中随机选择一个。 Since you have three levels of TOP(1) without ORDER BY, you might as well just mash them all up (join) and take a single TOP(1) across all of them. 由于您具有三个级别的TOP(1),而没有ORDER BY,因此不妨将它们全部混搭在一起(合并),然后对所有三个对象取一个TOP(1)。 That would make it look like this 那将使它看起来像这样

SET InvoiceId=
    (SELECT TOP 1 I.Id
     FROM DB.dbo.Invoices AS I
     JOIN DB.dbo.CustomerOrders AS O
        ON I.CustomerOrderId=O.Id
     JOIN DB.dbo.CustomerOrders AS OO
        ON O.ExternalOrderNumber='0'+SUBSTRING(OO.ExternalOrderNumber,1,100)
           AND OO.Id=I.CustomerOrderId)

However, as I mentioned very early on, this is not being correlated to the main FROM clause at all. 但是,正如我在很早之前提到的那样,这与主FROM子句根本没有关联。 We move the entire search into the main query so that we can make use of JOIN-based set operations rather than row-by-row subqueries. 我们将整个搜索移到主查询中,以便可以使用基于JOIN的集合操作,而不是逐行子查询。

Before I show the final query (fully commented), I think your SUBSTRING is supposed to address this logic but starts with 0 instead of the current digit . 在显示最终查询(完全注释)之前,我认为您的SUBSTRING应该解决此逻辑, but starts with 0 instead of the current digit However, if that means how I read it, it means that for an order number '5678', you're looking for '0678' which would also mean that SUBSTRING should be using 2,10000 instead of 1,10000 . 但是,如果这表示我的阅读方式,则意味着对于订单号'5678',您正在寻找'0678',这也意味着SUBSTRING应该使用2,10000而不是1,10000

UPDATE P
SET InvoiceId=II.Id
FROM DB.dbo.Payments AS P
-- invoices for payments
JOIN DB.dbo.Invoices AS I ON I.Id=P.InvoiceId
-- orders for invoices
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
-- another order with '0' as leading digit
JOIN DB.dbo.CustomerOrders AS OO
  ON OO.ExternalOrderNumber='0'+substring(O.ExternalOrderNumber,2,1000)
-- invoices for this other order
JOIN DB.dbo.Invoices AS II ON OO.Id=II.CustomerOrderId

-- conditions for the Payments records
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00'

It is worth noting that SQL Server allows UPDATE ..FROM ..JOIN which is less supported by other DBMS, eg Oracle. 值得注意的是,SQL Server允许UPDATE ..FROM ..JOIN ,而其他DBMS(例如Oracle)所支持的较少。 This is because for a single row in Payments (update target), I hope you can see that it is evident it could have many choices of II.Id to choose from from all the cartesian joins. 这是因为,对于Payments中的一行(更新目标),我希望您可以看到很明显,它可以从所有笛卡尔联接中选择II.Id。 You will get a random possible II.Id. 您将获得一个随机的II.Id。

I think something like this will be more efficient ,if I understood your query right. 我认为,如果我了解您的查询正确,这样的事情会更有效。 As i wrote it by hand and didn't run it, it may has some syntax error. 由于我是手动编写但未运行,因此可能存在一些语法错误。

UPDATE DB.dbo.Payments 
set InvoiceId=(SELECT TOP 1 I.Id FROM DB.dbo.Invoices AS I
         inner join DB.dbo.CustomerOrders AS O ON I.CustomerOrderId=O.Id 
         inner join DB.dbo.CustomerOrders AS OO On OO.Id=I.CustomerOrderId 
         and O.ExternalOrderNumber='0'+SUBSTRING(OO.ExternalOrderNumber, 1, 10000)))
FROM DB.dbo.Payments 
            JOIN DB.dbo.Invoices AS I ON I.Id=Payments.InvoiceId and 
             Payments.Active=0 
             AND Payments.Processed=0 
             AND O.ApprovalDate='2012-07-19 00:00:00'
            JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId

Try to re-write using JOINs. 尝试使用JOIN重写。 This will highlight some of the problems. 这将突出一些问题。 Will the following function do just the same? 以下功能是否会相同? (The queries are somewhat different, but I guess this is roughly what you're trying to do) (查询有些不同,但是我想这大概是您要尝试执行的操作)

UPDATE Payments 
   SET InvoiceId= I.Id
FROM DB.dbo.Payments
CROSS JOIN DB.dbo.Invoices AS I
INNER JOIN DB.dbo.CustomerOrders AS O
  ON I.CustomerOrderId = O.Id
INNER JOIN DB.dbo.CustomerOrders AS OO
  ON O.ExternalOrderNumer = '0' + SUBSTRING(OO.ExternalOrderNumber, 1, 10000)
  AND OO.Id = I.CustomerOrderId
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00')

As you see, two problems stand out: 如您所见,突出了两个问题:

  1. The undonditional join between Payments and Invoices (of course, you've caught this off by a TOP 1 statement, but set-wise it's still unconditional) - I'm not really sure if this really is a problem in your query. 付款和发票之间无可辩驳的联接(当然,您已经被TOP 1语句捕获了,但是按条件设置,它仍然是无条件的)-我不确定这是否确实是您的查询中的问题。 Will be in mine though :). 虽然会在我的:)。
  2. The join on a 10000-character column ( SUBSTRING ), embodied in a condition. 条件中包含的10000个字符的列上的SUBSTRINGSUBSTRING )。 This is highly inefficient. 这是非常低效的。

If you need a one-time speedup, just take the queries on each table, try to store the in-between-results in temporary tables, create indices on those temporary tables and use the temporary tables to perform the update. 如果需要一次性加速,只需对每个表进行查询,尝试将中间结果存储在临时表中,在这些临时表上创建索引,然后使用临时表执行更新。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM