We have an SSIS project to load data from CSV to a staging area on SQL Server (DB_Stage).
The main purpose of Staging is to prepare the data ready to move to the Production database(DB_Prod) and in the process flag any errors with data or files.
DB_Stage is created by taking create table scripts from DB_Prod, so the table structure in both databases is same. Once the load to DB_Stage is successful then data need to be moved to DB_Prod.
I'm thinking to create a stored procedure for each table in DB_Stage to push data to DB_Prod as there is no transformation required and also thinking SQL to SQL is faster. However I read some articles which say SSIS has the capacity to parallel processing and load will be much faster. But I didn't understand it completely.
I can create another set of SSIS packages to move data from Stage to Prod using Biml with in no time. But I need some advice on which is the best approach. Stored procedures or SSIS packages in my scenario.
One advantage if I use SSIS package is I can configure the destination database, so Stage data can be loaded to any server/database (this is a requirement for us).
If I use stored procedures I don't find a way to parametarize the target database. It seems I must hard code this way...
insert into Prod_DB.dbo.Table1(col list) select (col list) from DB_Stage.dbo.table1.
Any help would be greatly appreciated.
As you have mentioned that you don't have any transformation to use in between dev and prod environment. I would recommend you to use SSIS instead of stored procedure. SSIS will consider this as a synchronous task and will start transferring record as soon as it can. SSIS can take advantage of buffer pipe line which you can control and achieve parallelism.
I would recommend few setting to be done when using SSIS in this case:
I am pretty sure you will see performance gain by using SSIS instead of TSQL here.
As you have mentioned, there are 2 ways to copy the data from 1 server to another. Let's go through them one by one.
Stored Procedure: You will have to first create a linked server connection between the Prod server and the Staging server by using sp_addlinkedserver . This would allow you to use 4 part naming to reference tables from Staging Server like [ProdServer].[ProdDB].[dbo].[Table1] Here you can make use of what one calls a ' dynamic query '. In this particular kind of query, we can specify certain values of a SQL Query as varchar variables and then execute the query.
What you will be essentially writing is a query as follows:
'INSERT INTO [' + @ProdServer + '].[' + @ProdDB +'].[dbo].[Table1] (col list)
SELECT (col list) FROM [DB_Stage].[dbo].[table1]'
In here you have the condition of 4,000 characters in case of nvarchar strings, or 8,000 characters in case of varchar strings.
SSIS:
As you have mentioned SSIS allows you to parallelize the data flow from your staging server to Prod Server. The methods are quite straight forward as explained here . However, if the table is too large, I would suggest you to use Balanced Data Distributor , which is an optimization on the parallel data flow.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.