简体   繁体   English

存档大表(SQL Server 2008)

[英]Archiving Large Table (SQL Server 2008)

I have a very large table being filled with about 100s of million records each quarter. 我有一张非常大的桌子,每个季度都有大约100万条记录。

I manually move data from the existing table to another database using this script , to minimize the backup size, and to off load the production database when performing queries. 我使用此脚本手动将数据从现有表移动到另一个数据库,以最小化备份大小,并在执行查询时卸载生产数据库。

Is there any better way, for example, some scheduled script that will move data from the production database to some other database and then delete the records from the source database every day or week efficiently? 有没有更好的方法,例如,一些预定的脚本将数据从生产数据库移动到其他数据库,然后每天或每周有效地从源数据库中删除记录?

Note that my log file is growing rapidly due to the high number of INSERTs into this table, also when I move data to the archive database, DELETEs will be logged. 请注意,由于此表中有大量INSERT,我的日志文件正在快速增长,当我将数据移动到存档数据库时,也会记录DELETE。

Thanks 谢谢

Let me recap the requirements: 让我回顾一下这些要求:

  1. reduce the backup size 减少备份大小
  2. reduce the number of records in the database by archiving 通过归档减少数据库中的记录数
  3. archive the data without excessive logging 存档数据而不会过多记录

In order to reduce the backup size, you'll need to move the data into a different database. 为了减少备份大小,您需要将数据移动到不同的数据库中。

As far as logging goes, you'll want to look over the rules of minimal logging and make sure that you are following them. 对于日志记录,您将需要查看最小日志记录规则并确保您遵循它们。 Make sure that the recovery model of the database you are inserting into is in the simple or bulk-logged recovery model. 确保要插入的数据库的恢复模型位于简单或大容量日志恢复模型中。

For inserting the archived data, you want to disable non-clustereds (and rebuild them after the insert has completed), utilize trace flag 610 if there is a clustered index, and put a table lock on the destination table. 要插入存档数据,您需要禁用非群集(并在插入完成后重建它们),如果存在聚簇索引则使用跟踪标志610,并在目标表上放置表锁。 There are many more rules in the link that you'll want to check off, but these are the basics. 您要检查的链接中有更多规则,但这些是基础知识。

There is no minimal logging for deletes, but you can minimize log file growth by deleting in chunks with the top clause. 删除没有最小日志记录,但您可以通过使用top子句删除块来最小化日志文件增长。 The basic idea is (switch to simple recovery model for the duration of the delete to limit file growth): 基本思路是(在删除期间切换到简单恢复模型以限制文件增长):

SELECT NULL;

WHILE @@ROWCOUNT > 0

     DELETE TOP (50000) FROM TABLE WHERE Condition = TRUE;

Adjust the top number to adjust how much logging per delete is done. 调整顶部数字以调整每次删除的记录数。 You'll also want to make sure the predicate condition is correct so that you only delete what you intend to. 您还需要确保谓词条件正确,以便您只删除您想要的内容。 This will delete 50000, then if a rowcount is returned, it will repeat until the rowcount returned is 0. 这将删除50000,然后如果返回rowcount,它将重复,直到返回的rowcount为0。

If you really want minimal logging for everything, you can partition the source table by week, create a clone of the source table (on the same partition function and identical indexing structure), switch the partition from the source table to the cloned table, insert from the cloned table to the archive table, then truncate the cloned table. 如果你真的想要最少的日志记录,你可以按周划分源表,创建源表的克隆(在相同的分区函数和相同的索引结构上),将分区从源表切换到克隆表,插入从克隆表到存档表,然后截断克隆表。 The advantage of this is a truncate rather than a delete. 这样做的好处是截断而不是删除。 The disadvantage is that it's much more complicated to setup, maintain, and query (you get one heap or b-tree per partition, so if all queries don't utilize partition elimination, a clustered index/table scan would have to scan multiple b-trees/heaps instead of just one). 缺点是设置,维护和查询要复杂得多(每个分区有一个堆或b树,所以如果所有查询都不利用分区消除,则聚簇索引/表扫描必须扫描多个b -trees / heaps而不仅仅是一个)。

Have you thought about using SSIS to do this. 你有没有考虑使用SSIS来做到这一点。 I use SSIS to do the archiving and the backup in an order. 我使用SSIS在订单中进行存档和备份。 You can also use the same script in the tsql task and schedule it using the agent. 您还可以在tsql任务中使用相同的脚本,并使用代理程序对其进行计划。 Or you can just use the agent and past the script into it. 或者您可以使用代理并将脚本放入其中。

You could use Table partitioning instead of moving data 您可以使用表分区而不是移动数据

http://technet.microsoft.com/en-us/library/dd578580(v=sql.100).aspx http://technet.microsoft.com/en-us/library/dd578580(v=sql.100).aspx

http://msdn.microsoft.com/en-us/library/ms345146(v=sql.90).aspx http://msdn.microsoft.com/en-us/library/ms345146(v=sql.90).aspx

For moving data periodically you could use SQL Server Job scheduling functionality to run a SSIS package. 为了定期移动数据,您可以使用SQL Server作业调度功能来运行SSIS包。

Maybe Data Transformation Services (DTS) could be used too. 也许数据转换服务(DTS)也可以使用。

Partitioning, definitely. 绝对是分区。 It will remove the need of a new database. 它将消除对新数据库的需求。 Good example here 这里很好的例子

If you dont want to change your architecture, I suggest using SSIS to move the data rather than scripts 如果您不想更改架构,我建议使用SSIS来移动数据而不是脚本

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM