简体   繁体   中英

MS SQL Server, multiple insert

Say I write the query:

INSERT INTO DestinationTable
(ColumnA, ColumnB, ColumnC, etc.)
SELECT FROM SourceTable
(ColumnA, ColumnB, ColumnC, etc.)

And my source table has 22 million rows.

SQL server fills up my hard drive, and errors out.

Why can't SQL server handle my query?

Should I use a cursor and insert a row at a time?

PS - it is SQL Express 2005, but I could try on the full version.

UPDATE: I also want to mention that my source table only takes up around 1GB of storage when I look at it in the management studio. And yet my 25GB of free disk space somehow gets filled up? I am also using 2 different databases Source.mdf -> Destination.mdf, I don't know if this makes any difference.

Batch update...

INSERT INTO DestinationTable
    (ColumnA, ColumnB, ColumnC, etc.)
SELECT TOP 100000 ColumnA, ColumnB, ColumnC, etc.
FROM SourceTable
WHERE NOT EXISTS (SELECT *
    FROM DestinationTable
    WHERE DestinationTable.KeyCols = SourceTable.KeyCols)

WHILE @@ROWCOUNT <> 0
    INSERT INTO DestinationTable
        (ColumnA, ColumnB, ColumnC, etc.)
    SELECT TOP 100000 ColumnA, ColumnB, ColumnC, etc.
    FROM SourceTable
    WHERE NOT EXISTS (SELECT *
        FROM DestinationTable
        WHERE DestinationTable.KeyCols = SourceTable.KeyCols)

There are variations to deal with checkpointing, log file management, if you need it in one txn etc

You can bulk-copy the data to a file in native format (edit changed from Csv to native) and import it in back into the new table.

Read up the BCP utility here.

You could try setting the database recovery model to "Simple" instead of "Full" (the default). This is done on the Options page of the database properties in Management Studio. That should keep your transaction log size down. After you're done the insert you can always set the recovery model back to Full.

This blog post has info about importing data into SQL Server.

As for the reason you table is filling up, I would look at the schema of the table, and make sure there are the column sizes are as small as they can possibly be.

I would really analyze if all the data is necessary.

I would highly recommend you to set the database recovery model to BULK_LOGGED while carrying out such heavy bulk data operations.

By default - database is set to SIMPLE or FULL recovery model.

The full recovery model, which fully logs all transactions, is intended for normal use.

The bulk-logged recovery model is intended to be used temporarily during a large bulk operation— assuming that it is among the bulk operations that are affected by the bulk-logged recovery model (for more information, see Operations That Can Be Minimally Logged at msdn.microsoft.com/en-us/library/ms191244.aspx).

BULK_LOGGED recovery model minimally logs the transactions

you can do it by using below snippet

    --Determine the recovery model currently used for the database

    SELECT name AS [Database Name],
    recovery_model_desc AS [Recovery Model]
    FROM sys.databases 
    WHERE name=<database_name> ;

    --Remember this recovery model so that you can switch back to the same later

    --set the database recovery model to BULK_LOGGED

    ALTER DATABASE <database_name>  SET RECOVERY BULK_LOGGED;

    --Run your heavy data insert tasks
    INSERT INTO DestinationTable
    (ColumnA, ColumnB, ColumnC, etc.)
    SELECT FROM SourceTable
    (ColumnA, ColumnB, ColumnC, etc.)

    /*Again set the database recovery model to FULL or SIMPLE 
    (the result which we had got from first query)*/

    ALTER DATABASE <database_name>  SET RECOVERY FULL;   
    --OR 
    ALTER DATABASE <database_name>  SET RECOVERY SIMPLE;   

*Note - Please do keep patience during the bulk operation is being processed * [:P]

I have done this many times before. Do let me know whether this helped you.

You can refer below MSDN article for details of switching between recovery models - Considerations for Switching from the Full or Bulk-Logged Recovery Model at msdn.microsoft.com/en-us/library/ms190203.aspx

You are inserting data in a way that supports a transaction. There is no way to disable this through the method you're using, however you could do this outside of the scope of a transaction through other methods. Read below:

http://support.microsoft.com/kb/59462

The key approach is this:

DBOPTION 'SELECT INTO' to true

http://www.mssqlcity.com/FAQ/Devel/select_into.htm

The problem with INSERT INTO ... SELECT (22 million rows) is that it all runs as one transaction. So you will probably fill up the transaction log drive even if the database is in simple recovery mode.

Inserting one row at a time is a horrible idea, it will take forever.

Exporting the data with BCP, and importing is as a BULK INSERT is probably the fastest method. But it requires learning how to use the BCP utility.

If you are determined to do this in T-SQL, you have to break it up into batches. The INSERT ... SELECT TOP (n) ... WHERE NOT EXISTS method in the previous answer works, but the execution time for the WHERE clause could add up. To make it a little more efficient and yet more complicated, I sometimes fill a temp table with the pk values for every n rows using ROW_NUMBER() OVER (ORDER BY pk) and WHERE rn % (n) = 0. Then you can use a loop with INSERT INTO ... SELECT ... WHERE pk > @a AND pk <= @b, with appropriate code to update the variables on each iteration from the temp table. Just make sure you don't miss any rows on the first or last iteration.

You might want to do this in Integration Services, which can also do bulk inserts. There is a Microsoft white paper somewhere about loading terabytes of data in 30 minutes or so. They exported (BCP?) the source data into multiple files, and created multiple tables with the same structure as the destination. Then insert each file into a separate empty table, and they can all run as minimally-logged. And all these imports were running as separate parallel processes. Finally use table partitioning commands to merge each import table into the destination table.

Load a terabyte in 30 minutes: https://technet.microsoft.com/en-us/library/dd537533(v=sql.100).aspx

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM