简体   繁体   English

提高Sql Delete的性能

[英]Improving performance of Sql Delete

We have a query to remove some rows from the table based on an id field (primary key). 我们有一个查询,要根据id字段(主键)从表中删除一些行。 It is a pretty straightforward query: 这是一个非常简单的查询:

delete all from OUR_TABLE where ID in (123, 345, ...)

The problem is no.of ids can be huge (Eg. 70k), so the query takes a long time. 问题是没有.id可能很大(例如70k),因此查询需要很长时间。 Is there any way to optimize this? 有没有办法优化这个? (We are using sybase - if that matters). (我们正在使用sybase - 如果这很重要)。

There are two ways to make statements like this one perform: 有两种方法可以使这样的语句执行:

  1. Create a new table and copy all but the rows to delete. 创建一个新表并复制除删除行之外的所有行。 Swap the tables afterwards ( alter table name ... ) I suggest to give it a try even when it sounds stupid. 之后交换表格( alter table name ... )我建议即使它听起来很愚蠢也要尝试一下。 Some databases are much faster at copying than at deleting. 某些数据库的复制速度比删除时快得多。

  2. Partition your tables. 对表格进行分区。 Create N tables and use a view to join them into one. 创建N个表并使用视图将它们合并为一个。 Sort the rows into different tables grouped by the delete criterion. 将行排序为按删除标准分组的不同表。 The idea is to drop a whole table instead of deleting individual rows. 我们的想法是删除整个表而不是删除单个行。

I'm wondering if parsing an IN clause with 70K items in it is a problem. 我想知道解析一个包含70K项的IN子句是否有问题。 Have you tried a temp table with a join instead? 您是否尝试过使用连接的临时表?

Consider running this in batches. 考虑分批运行。 A loop running 1000 records at a time may be much faster than one query that does everything and in addition will not keep the table locked out to other users for as long at a stretch. 一次运行1000条记录的循环可能比执行所有操作的一个查询快得多,此外不会长时间将表锁定到其他用户。

If you have cascade delete (and lots of foreign key tables affected) or triggers involved, you may need to run in even smaller batches. 如果您有级联删除(以及许多受影响的外键表)或涉及的触发器,您可能需要以更小的批次运行。 You'll have to experiement to see which is the best number for your situation. 您需要体验一下,看看哪种情况最适合您的情况。 I've had tables where I had to delete in batches of 100 and others where 50000 worked (fortunate in that case as I was deleting a million records). 我有桌子,我必须在100个批次中删除,其他50000个工作(幸运的是,因为我删除了100万条记录)。

But in any even I would put my key values that I intend to delete into a temp table and delete from there. 但是在任何情况下,我都会将我打算删除的键值放入临时表中并从那里删除。

Can Sybase handle 70K arguments in IN clause? Sybase可以在IN子句中处理70K参数吗? All databases I worked with have some limit on number of arguments for IN clause. 我使用的所有数据库都对IN子句的参数数量有一些限制。 For example, Oracle have limit around 1000. 例如,Oracle限制在1000左右。

Can you create subselect instead of IN clause? 你能创建subselect而不是IN子句吗? That will shorten sql. 这将缩短sql。 Maybe that could help for such a big number of values in IN clause. 也许这可以帮助IN子句中的如此大量的值。 Something like this: 像这样的东西:

  DELETE FROM OUR_TABLE WHERE ID IN 
        (SELECT ID FROM somewhere WHERE some_condition)

Deleting large number of records can be sped up with some interventions in database, if database model permits. 如果数据库模型允许,可以通过数据库中的一些干预来加速删除大量记录。 Here are some strategies: 以下是一些策略:

  1. you can speed things up by dropping indexes, deleting records and recreating indexes again. 您可以通过删除索引,删除记录并再次重新创建索引来加快速度。 This will eliminate rebalancing index trees while deleting records. 这将在删除记录时消除重新平衡索引树。

    • drop all indexes on table 删除表上的所有索引
    • delete records 删除记录
    • recreate indexes 重新创建索引
    • if you have lots of relations to this table, try disabling constraints if you are absolutely sure that delete command will not break any integrity constraint. 如果你有很多与这个表的关系,如果你绝对确定delete命令不会破坏任何完整性约束,请尝试禁用约束。 Delete will go much faster because database won't be checking integrity. 删除速度会快得多,因为数据库不会检查完整性。 Enable constraints after delete. 删除后启用约束。
    • disable integrity constraints, disable check constraints 禁用完整性约束,禁用检查约束
    • delete records 删除记录
    • enable constraints 启用约束
    • disable triggers on table, if you have any and if your business rules allow that. 如果您有任何触发器,并且您的业务规则允许,则禁用表上的触发器。 Delete records, then enable triggers. 删除记录,然后启用触发器。

    • last, do as other suggested - make a copy of the table that contains rows that are not to be deleted, then drop original, rename copy and recreate integrity constraints, if there are any. 最后,按照其他建议执行操作 - 制作包含不要删除的行的表的副本,然后删除原始,重命名副本并重新创建完整性约束(如果有)。

I would try combination of 1, 2 and 3. If that does not work, then 4. If everything is slow, I would look for bigger box - more memory, faster disks. 我会尝试1,2和3的组合。如果这不起作用,那么4.如果一切都很慢,我会寻找更大的盒子 - 更多的内存,更快的磁盘。

Find out what is using up the performance! 找出什么在用尽性能!

In many cases you might use one of the solutions provided. 在许多情况下,您可以使用提供的解决方案之一。 But there might be others (based on Oracle knowledge, so things will be different on other databases. Edit: just saw that you mentioned sybase): 但可能还有其他人(基于Oracle的知识,所以其他数据库的情况会有所不同。编辑:刚看到你提到了sybase):

  • Do you have foreign keys on that table? 那张桌子上有外键吗? Makes sure the referring ids are indexed 确保引用ID已编入索引
  • Do you have indexes on that table? 那张桌子上有索引吗? It might be that droping before delete and recreating after the delete might be faster. 可能是在删除之前删除并在删除之后重新创建可能更快。
  • check the execution plan. 检查执行计划。 Is it using an index where a full table scan might be faster? 它是否使用全表扫描可能更快的索引? Or the other way round? 或者反过来说? HINTS might help 提示可能有所帮助
  • instead of a select into new_table as suggested above a create table as select might be even faster. 如上所述,创建表可能更快,而不是像上面建议的那样选择到new_table。

But remember: Find out what is using up the performance first. 但请记住:首先找出正在消耗性能的因素。

When you are using DDL statements make sure you understand and accept the consequences it might have on transactions and backups. 使用DDL语句时,请确保了解并接受它可能对事务和备份产生的后果。

I also think that the temp table is likely the best solution. 我也认为临时表可能是最好的解决方案。

If you were to do a "delete from .. where ID in (select id from ...)" it can still be slow with large queries, though. 如果您要执行“从...中删除ID(从中选择ID)”,那么对于大型查询,它仍然会很慢。 I thus suggest that you delete using a join - many people don't know about that functionality. 因此,我建议您使用联接删除 - 许多人不知道该功能。

So, given this example table: 所以,给出这个示例表:

    -- set up tables for this example
    if exists (select id from sysobjects where name = 'OurTable' and type = 'U')
        drop table OurTable
    go

    create table OurTable (ID integer primary key not null)
    go
    insert into OurTable (ID) values (1)
    insert into OurTable (ID) values (2)
    insert into OurTable (ID) values (3)
    insert into OurTable (ID) values (4)
    go

We can then write our delete code as follows: 然后我们可以编写删除代码如下:

    create table #IDsToDelete (ID integer not null)
    go
    insert into #IDsToDelete (ID) values (2)
    insert into #IDsToDelete (ID) values (3)
    go
    -- ... etc ...
    -- Now do the delete - notice that we aren't using 'from'
    -- in the usual place for this delete
    delete OurTable from #IDsToDelete
       where OurTable.ID = #IDsToDelete.ID
    go
    drop table #IDsToDelete
    go
    -- This returns only items 1 and 4
    select * from OurTable order by ID
    go

Try sorting the ID you are passing into "in" in the same order as the table, or index is stored in. You may then get more hits on the disk cache. 尝试按照与表相同的顺序对传入的ID进行排序,或者存储索引。然后,您可以在磁盘缓存上获得更多命中。

Putting the ID to be deleted into a temp table that has the Ids sorted in the same order as the main table, may let the database do a simple scanned over the main table. 将要删除的ID放入临时表中,该临时表的排序顺序与主表的顺序相同,可以让数据库在主表上进行简单的扫描。

You could try using more then one connection and spiting the work over the connections so as to use all the CPUs on the database server, however think about what locks will be taken out etc first. 您可以尝试使用多个连接并通过连接执行工作,以便使用数据库服务器上的所有CPU,但请先考虑将取出哪些锁等。

our_table是否有关于删除级联的参考?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM