简体   繁体   English

MS SQL Server,多次插入

[英]MS SQL Server, multiple insert

Say I write the query: 假设我写了查询:

INSERT INTO DestinationTable
(ColumnA, ColumnB, ColumnC, etc.)
SELECT FROM SourceTable
(ColumnA, ColumnB, ColumnC, etc.)

And my source table has 22 million rows. 我的源表有2200万行。

SQL server fills up my hard drive, and errors out. SQL服务器填满我的硬盘驱动器,并出错。

Why can't SQL server handle my query? 为什么SQL服务器无法处理我的查询?

Should I use a cursor and insert a row at a time? 我应该使用游标并一次插入一行吗?

PS - it is SQL Express 2005, but I could try on the full version. PS - 它是SQL Express 2005,但我可以尝试完整版。

UPDATE: I also want to mention that my source table only takes up around 1GB of storage when I look at it in the management studio. 更新:我还想提一下,当我在管理工作室查看时,我的源表只占用大约1GB的存储空间。 And yet my 25GB of free disk space somehow gets filled up? 然而,我的25GB可用磁盘空间不知何故被填满了吗? I am also using 2 different databases Source.mdf -> Destination.mdf, I don't know if this makes any difference. 我也在使用2个不同的数据库Source.mdf - > Destination.mdf,我不知道这是否有任何区别。

Batch update... 批量更新......

INSERT INTO DestinationTable
    (ColumnA, ColumnB, ColumnC, etc.)
SELECT TOP 100000 ColumnA, ColumnB, ColumnC, etc.
FROM SourceTable
WHERE NOT EXISTS (SELECT *
    FROM DestinationTable
    WHERE DestinationTable.KeyCols = SourceTable.KeyCols)

WHILE @@ROWCOUNT <> 0
    INSERT INTO DestinationTable
        (ColumnA, ColumnB, ColumnC, etc.)
    SELECT TOP 100000 ColumnA, ColumnB, ColumnC, etc.
    FROM SourceTable
    WHERE NOT EXISTS (SELECT *
        FROM DestinationTable
        WHERE DestinationTable.KeyCols = SourceTable.KeyCols)

There are variations to deal with checkpointing, log file management, if you need it in one txn etc 如果您需要在一个txn等中处理检查点,日志文件管理,则有各种变体

You can bulk-copy the data to a file in native format (edit changed from Csv to native) and import it in back into the new table. 您可以将数据批量复制到本机格式的文件(编辑从Csv更改为本机),然后将其重新导入到新表中。

Read up the BCP utility here. 在这里阅读BCP实用程序。

You could try setting the database recovery model to "Simple" instead of "Full" (the default). 您可以尝试将数据库恢复模型设置为“简单”而不是“完整”(默认值)。 This is done on the Options page of the database properties in Management Studio. 这是在Management Studio中的数据库属性的“选项”页面上完成的。 That should keep your transaction log size down. 这应该会降低您的事务日志大小。 After you're done the insert you can always set the recovery model back to Full. 完成插入后,您始终可以将恢复模型设置回“完全”。

This blog post has info about importing data into SQL Server. 此博客文章包含有关将数据导入SQL Server的信息。

As for the reason you table is filling up, I would look at the schema of the table, and make sure there are the column sizes are as small as they can possibly be. 至于你的表填满的原因,我会查看表的模式,并确保列的大小尽可能小。

I would really analyze if all the data is necessary. 我真的要分析是否所有数据都是必要的。

I would highly recommend you to set the database recovery model to BULK_LOGGED while carrying out such heavy bulk data operations. 我强烈建议您在执行此类繁重的批量数据操作时将数据库恢复模型设置为BULK_LOGGED。

By default - database is set to SIMPLE or FULL recovery model. 默认情况下 - 数据库设置为SIMPLE或FULL恢复模型。

The full recovery model, which fully logs all transactions, is intended for normal use. 完全记录所有事务的完整恢复模型旨在用于正常使用。

The bulk-logged recovery model is intended to be used temporarily during a large bulk operation— assuming that it is among the bulk operations that are affected by the bulk-logged recovery model (for more information, see Operations That Can Be Minimally Logged at msdn.microsoft.com/en-us/library/ms191244.aspx). 大容量日志恢复模型旨在临时在大型批量操作期间使用 - 假设它是受批量日志恢复模型影响的批量操作之一(有关更多信息,请参阅msdn中最小可记录的操作) .microsoft.com / EN-US /库/ ms191244.aspx)。

BULK_LOGGED recovery model minimally logs the transactions BULK_LOGGED恢复模型最少记录事务

you can do it by using below snippet 你可以使用下面的代码片段来完成它

    --Determine the recovery model currently used for the database

    SELECT name AS [Database Name],
    recovery_model_desc AS [Recovery Model]
    FROM sys.databases 
    WHERE name=<database_name> ;

    --Remember this recovery model so that you can switch back to the same later

    --set the database recovery model to BULK_LOGGED

    ALTER DATABASE <database_name>  SET RECOVERY BULK_LOGGED;

    --Run your heavy data insert tasks
    INSERT INTO DestinationTable
    (ColumnA, ColumnB, ColumnC, etc.)
    SELECT FROM SourceTable
    (ColumnA, ColumnB, ColumnC, etc.)

    /*Again set the database recovery model to FULL or SIMPLE 
    (the result which we had got from first query)*/

    ALTER DATABASE <database_name>  SET RECOVERY FULL;   
    --OR 
    ALTER DATABASE <database_name>  SET RECOVERY SIMPLE;   

*Note - Please do keep patience during the bulk operation is being processed * [:P] *注意 - 请在批量操作过程中保持耐心* [:P]

I have done this many times before. 我以前做了很多次。 Do let me know whether this helped you. 请告诉我这是否对您有所帮助。

You can refer below MSDN article for details of switching between recovery models - Considerations for Switching from the Full or Bulk-Logged Recovery Model at msdn.microsoft.com/en-us/library/ms190203.aspx 有关在恢复模型之间切换的详细信息,请参阅以下MSDN文章 - 在msdn.microsoft.com/en-us/library/ms190203.aspx上从完整或批量记录恢复模型切换的注意事项

You are inserting data in a way that supports a transaction. 您正在以支持事务的方式插入数据。 There is no way to disable this through the method you're using, however you could do this outside of the scope of a transaction through other methods. 无法通过您正在使用的方法禁用此功能,但是您可以通过其他方法在事务范围之外执行此操作。 Read below: 参见下文:

http://support.microsoft.com/kb/59462 http://support.microsoft.com/kb/59462

The key approach is this: 关键的方法是:

DBOPTION 'SELECT INTO' to true DBOPTION'SELECT INTO'为真

http://www.mssqlcity.com/FAQ/Devel/select_into.htm http://www.mssqlcity.com/FAQ/Devel/select_into.htm

The problem with INSERT INTO ... SELECT (22 million rows) is that it all runs as one transaction. INSERT INTO ... SELECT(2200万行)的问题在于它都作为一个事务运行。 So you will probably fill up the transaction log drive even if the database is in simple recovery mode. 因此,即使数据库处于简单恢复模式,您也可能会填满事务日志驱动器。

Inserting one row at a time is a horrible idea, it will take forever. 一次插入一行是一个可怕的想法,它将需要永远。

Exporting the data with BCP, and importing is as a BULK INSERT is probably the fastest method. 使用BCP导出数据,导入是BULK INSERT可能是最快的方法。 But it requires learning how to use the BCP utility. 但它需要学习如何使用BCP实用程序。

If you are determined to do this in T-SQL, you have to break it up into batches. 如果您决定在T-SQL中执行此操作,则必须将其分解为批处理。 The INSERT ... SELECT TOP (n) ... WHERE NOT EXISTS method in the previous answer works, but the execution time for the WHERE clause could add up. INSERT ... SELECT TOP(n)...上一个答案中的WHERE NOT EXISTS方法有效,但WHERE子句的执行时间可能会相加。 To make it a little more efficient and yet more complicated, I sometimes fill a temp table with the pk values for every n rows using ROW_NUMBER() OVER (ORDER BY pk) and WHERE rn % (n) = 0. Then you can use a loop with INSERT INTO ... SELECT ... WHERE pk > @a AND pk <= @b, with appropriate code to update the variables on each iteration from the temp table. 为了使它更高效,更复杂,我有时使用ROW_NUMBER()OVER(ORDER BY pk)和WHERE rn%(n)= 0来填充每n行的pk值的临时表。然后你可以使用一个带INSERT INTO ... SELECT ... WHERE pk> @a和pk <= @b的循环,带有适当的代码来更新temp表中每次迭代的变量。 Just make sure you don't miss any rows on the first or last iteration. 只需确保在第一次或最后一次迭代中不会遗漏任何行。

You might want to do this in Integration Services, which can also do bulk inserts. 您可能希望在Integration Services中执行此操作,Integration Services也可以执行批量插入。 There is a Microsoft white paper somewhere about loading terabytes of data in 30 minutes or so. 有一篇关于在30分钟左右内加载数TB数据的微软白皮书。 They exported (BCP?) the source data into multiple files, and created multiple tables with the same structure as the destination. 他们将源数据导出(BCP?)到多个文件中,并创建了多个与目标结构相同的表。 Then insert each file into a separate empty table, and they can all run as minimally-logged. 然后将每个文件插入一个单独的空表中,它们都可以作为最小化日志运行。 And all these imports were running as separate parallel processes. 所有这些导入都作为单独的并行进程运行。 Finally use table partitioning commands to merge each import table into the destination table. 最后使用表分区命令将每个导入表合并到目标表中。

Load a terabyte in 30 minutes: https://technet.microsoft.com/en-us/library/dd537533(v=sql.100).aspx 在30分钟内加载1TB: https//technet.microsoft.com/en-us/library/dd537533(v = sql.100).aspx

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM