简体   繁体   English

如何使用insert语句将数百万个不同RDBMS的数据插入到SQL Server数据库中?

[英]How to insert millions of data of different RDBMS in to SQL Server database with insert statement?

I have two databases in my SQL Server with each database containing 1 single table as of now. 我的SQL Server中有两个数据库,每个数据库包含1个单表。

I have 2 database like below : 我有2个数据库,如下所示:

1) Db1 (MySQL) 1)Db1(MySQL)

2) Db2 (Oracle) 2)Db2(Oracle)

Now what I want to do is fill my database table of SQL Server db1 with data from Db1 from MySQL like below : 现在我要做的是用来自MySQL的Db1的数据填充SQL Server db1的数据库表,如下所示:

Insert into Table1 select * from Table1

Select * from Table1(Mysql Db1) - Data coming from Mysql database 从Table1中选择*(Mysql Db1) - 来自Mysql数据库的数据

Insert into Table1(Sql server Db1) - Insert data coming from Mysql database considering same schema 插入表1(Sql server Db1) - 插入来自Mysql数据库的数据,考虑相同的模式

I don't want to use sqlbulk copy as I don't want to insert chunk by chunk data. 我不想使用sqlbulk copy,因为我不想通过块数据插入块。 I want to insert all data in 1 go considering millions of data as my operation is just not limited to insert records in database. 我想在1中插入所有数据,考虑数百万个数据,因为我的操作不仅限于在数据库中插入记录。 So user have to sit wait for a long like first millions of data inserting chunk by chunk in database and then again for my further operation which is also long running operation. 所以用户必须等待很长时间,比如数百万的数据在数据库中按块插入块,然后再次进行我的进一步操作,这也是长时间运行的操作。

So if I have this process speed up then I can have my second operation also speed up considering all records are in my 1 local sql server instance. 因此,如果我将此进程加速,那么考虑到所有记录都在我的本地sql server实例中,我可以加速我的第二次操作。

Is this possible to achieve in a C# application? 这可能在C#应用程序中实现吗?

Update: I researched about Linked server as @GorDon Linoff suggested me that linked server can be use to achieve this scenario but based on my research it seems like i cannot create linked server through code. 更新:我研究了链接服务器作为@GorDon Linoff建议我可以使用链接服务器来实现这种情况,但根据我的研究,我似乎无法通过代码创建链接服务器。

I want to do this with the help of ado.net . 我想在ado.net的帮助下做到这一点

This is what I am trying to do exactly: 这就是我想要做的事情:

Consider I have 2 different client RDBMS with 2 database and some tables in client premises. 考虑我有2个不同的客户端RDBMS,其中包含2个数据库和客户端内部的一些表。

So database is like this : 所以数据库是这样的:

Sql Server :

Db1

Order
Id      Amount
1       100
2       200
3       300
4       400


Mysql or Oracle :

Db1:

Order
Id      Amount
1       1000
2       2000
3       3000
4       400

Now I want to compare Amount column from source (SQL Server) to destination database (MySQL or Oracle). 现在我想比较从源(SQL Server)到目标数据库(MySQL或Oracle)的Amount列。

I will be use to join this 2 different RDBMS databases tables to compare Amount columns. 我将使用这两个不同的RDBMS数据库表来比较Amount列。

In C# what I can do is like fetch chunk by chunk records in my datatable (in memory) then compare this records with the help of code but this will take so much time considering millions of records. 在C#中,我可以做的就像在我的数据表(内存中)中通过块记录获取块,然后在代码的帮助下比较这些记录,但考虑到数百万条记录需要花费很多时间。

So I want to do something better than this. 所以我想做一些比这更好的事情。

Hence I was thinking that i bring out this 2 RDBMS records in my local SQL server instance in 2 databases and then create join query joining this 2 tables based on Id and then take advantage of DBMS processing capability which can compare this millions of records efficiently. 因此,我想我在2个数据库中的本地SQL服务器实例中显示这2个RDBMS记录,然后创建基于Id加入这2个表的连接查询,然后利用DBMS处理功能,可以有效地比较这数百万条记录。

Query like this compares millions of records efficiently : 这样的查询有效地比较了数百万条记录:

select SqlServer.Id,Mysql.Id,SqlServer.Amount,Mysql.Amount from SqlServerDb.dbo.Order as SqlServer
Left join MysqlDb.dbo.Order as Mysql on SqlServer.Id=Mysql.Id
where SqlServer.Amount != Mysql.Amount

Above query works when I have this 2 different RDBMS data in my local server instance with database : SqlServerDb and MysqlDb and this will fetch below records whose amount is not matching : 当我在本地服务器实例中使用数据库:SqlServerDb和MysqlDb这两个不同的RDBMS数据时,上面的查询有效,这将获取其数量不匹配的记录:

So I am trying to get those records from source(Sql server Db) to MySQL whose Amount column value is not matching. 所以我试图将这些记录从源(Sql server Db)获取到其Amount列值不匹配的MySQL。

Expected Output : 预期产出:

Id      Amount
1       1000
2       2000
3       3000

So there is any way to achieve this scenario? 那么有什么方法可以实现这种情况?

On the SELECT side, create a .csv file (tab-delimited) using SELECT ... INTO OUTFILE ... SELECT端,使用SELECT ... INTO OUTFILE ...创建.csv文件(制表符分隔) SELECT ... INTO OUTFILE ...

On the INSERT side, use LOAD DATA INFILE ... (or whatever the target machine syntax is). INSERT端,使用LOAD DATA INFILE ... (或任何目标机器语法)。

Doing it all at once may be easier to code than chunking, and may (or may not) be faster running. 一次完成所有操作可能比分块更容易编码,并且可能(或可能不)更快地运行。

SqlBulkCopy can accept either a DataTable or a System.Data.IDataReader as its input. SqlBulkCopy可以接受DataTableSystem.Data.IDataReader作为其输入。

Using your query to read the source DB, set up a ADO.Net DataReader on the source MySQL or Oracle DB and pass the reader to the WriteToServer() method of the SqlBulkCopy . 使用查询读取源数据库,在源MySQL或Oracle DB上设置ADO.Net DataReader ,并将读取器传递给SqlBulkCopyWriteToServer()方法。

This can copy almost any number of rows without limit. 这可以无限制地复制几乎任意数量的行。 I have copied hundreds of millions of rows using the data reader approach. 我使用数据读取器方法复制了数亿行。

What about adding a changed date in the remote database. 如何在远程数据库中添加更改日期。

Then you could get all rows that have changed since the last sync and just compare those? 然后你可以得到自上次同步以来已经改变的所有行,然后比较那些?

First of all do not use linked server. 首先不要使用链接服务器。 It is tempting but it will more trouble than it is bringing on the table. 它很诱人,但它会带来更多的麻烦。 Like updates and inserts will fetch all of the target db to source db and do insert/update and post all data to target back. 像更新和插入一样,将获取所有目标数据库到源数据库并执行插入/更新并将所有数据发布到目标服务器。

As far as I understand you are trying to copy changed data to target system for some stuff. 据我所知,你正试图将改变后的数据复制到目标系统中。

I recommend using a timestamp column on source table. 我建议在源表上使用timestamp列。 When anything changes on source table timestamp column is updated by sql server. 当源表上的任何更改时间戳列由sql server更新时。

On target, get max ID and max timestamp. 在目标上,获取最大ID和最大时间戳。 two queries at max. 最多两个查询

On source, rows where source.ID <= target.MaxID && source.timestamp >= target.MaxTimeTamp is true, are the rows that changed after last sync (need update). 在源上, source.ID <= target.MaxID && source.timestamp >= target.MaxTimeTamp为true的行是上次同步(需要更新)后更改的行。 And rows where source.ID > target.MaxID is true, are the rows that are inserted after last sync. source.ID > target.MaxID为true的行是上次同步后插入的行。

Now you do not have to compare two worlds, and you just got all updates and inserts. 现在你不必比较两个世界,你只需要获得所有更新和插入。

You need to create a linked server connection using ODBC and the proper driver, after that you can execute the queries using openquery. 您需要使用ODBC和正确的驱动程序创建链接服务器连接,之后您可以使用openquery执行查询。

Take a look at openquery: 看一下openquery:

https://msdn.microsoft.com/en-us/library/ms188427(v=sql.120).aspx https://msdn.microsoft.com/en-us/library/ms188427(v=sql.120).aspx

Yes, SQL Server is very efficient when it's working with sets so let's keep that in play. 是的,SQL Server在处理集合时效率非常高,所以让我们保持这一点。

In a nutshell, what I'm pitching is 简而言之,我正在投球

  1. Load data from the source to a staging table on the target database (staging table = table to temporarily hold raw data from the source table, same structure as the source table... add tracking columns to taste). 将数据从源加载到目标数据库上的临时表(staging table = table,以暂时保存源表中的原始数据,与源表相同的结构...添加跟踪列以品尝)。 This will be done by your C# code... select from source_table into DataTable then SqlBulkCopy to the staging table. 这将由您的C#代码完成...从source_table选择DataTable,然后选择SqlBulkCopy到staging表。

  2. Have a stored proc on the target database to reconcile the data between your target table and the staging table. 在目标数据库上有一个存储过程,用于协调目标表和登台表之间的数据。 Your C# code calls the stored proc. 您的C#代码调用存储的proc。

Given that you're talking about millions of rows, another thing that can make things faster is dropping indices on the staging table before inserting to it and recreating those after the inserts and before any select is performed. 鉴于您正在谈论数百万行,另一件可以使事情变得更快的事情是在插入表之前删除登台表上的索引并在插入之后和执行任何选择之前重新创建索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM