標識列上具有父/子關系的SqlBulkCopy和DataTables

Question

我們需要根據父表中的Identity主鍵更新幾個具有父/子關系的表，父表由一個或多個子表作為外鍵引用。

由於數據量很大，我們希望在內存中構建這些表，然后使用C＃中的SqlBulkCopy從DataSet或單個DataTables更新數據庫。
我們還希望從多個線程，進程和可能的客戶端並行執行此操作。

我們在F＃中的原型顯示了許多承諾，性能提高了34倍，但此代碼強制在父表中使用已知的Identity值。 如果沒有強制，當SqlBulkCopy插入行時，Identity列會在數據庫中正確生成，但Identity值不會在內存中的DataTable中更新。 此外，即使它們是，也不清楚DataSet是否能正確地修復父/子關系，以便隨后可以用正確的外鍵值寫入子表。

任何人都可以解釋如何讓SqlBulkCopy更新標識值，以及如何配置數據集以保留和更新父/子關系，如果在單個DataTables上調用DataAdapter到FillSchema時不自動完成。

我不想要的答案：

讀取數據庫以查找當前最高的Identity值，然后在創建每個父行時手動遞增它。 不適用於多個進程/客戶端，並且據我所知，失敗的事務可能會導致某些標識值被跳過，因此這種方法可能會破壞關系。
一次一個地寫出父行，並要求返回Identity值。 這通過使用SqlBulkCopy至少失去了一些收益（是的，有更多子行比父行，但仍然有很多父行）。

類似於以下未回答的問題：

如何使用自動生成的標識密鑰更新數據集父子表？

Answer 1

首先：SqlBulkCopy不可能做你想要的。 顧名思義，它只是一條“單行道”。 我盡可能快地將數據移動到sql server中。 它是舊的批量復制命令的.Net版本，它將原始文本文件導入表中。 因此，如果您使用SqlBulkCopy，則無法獲取標識值。

我做了很多批量數據處理，並且多次遇到過這個問題。 解決方案取決於您的體系結構和數據分布。 以下是一些想法：

為每個線程創建一組目標表，在這些表中導入。 最后加入這些表格。 其中大部分都可以通過一種非常通用的方式實現，您可以從名為TABLENAME的表中自動生成名為TABLENAME_THREAD_ID的表。
將ID生成完全移出數據庫。 例如，實現生成ID的中央Web服務。 在這種情況下，您不應該為每個調用生成一個ID，而是生成ID范圍。 否則，網絡開銷通常會變成瓶頸。
嘗試為您的數據生成ID。 如果可能的話，你的問題就會消失。 快說不要說“不可能”。 也許您可以使用可以在后處理步驟中清理的字符串ID？

還有一點評論：使用BulkCopy時，因子34的增加會增加。 如果要快速插入數據，請確保正確配置數據庫。

Answer 2

閱讀這篇文章。 我認為這正是您正在尋找的東西以及更多。 非常好，優雅的解決方案。

http://www.codinghelmet.com/?path=howto/bulk-insert

Answer 3

使用SqlBulkCopy可以執行所需操作的唯一方法是首先將數據插入到臨時表中。 然后使用存儲過程將數據分發到destinate表。 是的，這會導致減速，但仍然會很快。

您也可以考慮重新設計數據，即將其拆分，對其進行非規范化等。

Answer 4

set identity_insert <table> on和dbcc checkident是你的朋友。 這就像我過去所做的那樣（參見代碼示例）。 唯一真正的警告是更新過程是唯一可以插入數據的過程：在更新過程中，其他人必須離開池。 當然，您可以在加載生產表之前以編程方式執行此類映射。 但是對插入的限制也適用：更新過程是唯一可以發揮作用的過程。

--
-- start with a source schema -- doesn't actually need to be SQL tables
-- but from the standpoint of demonstration, it makes it easier
--
create table source.parent
(
  id   int         not null primary key ,
  data varchar(32) not null ,
)
create table source.child
(
  id        int         not null primary key ,
  data      varchar(32) not null ,
  parent_id int         not null foreign key references source.parent(id) ,
)

--
-- On the receiving end, you need to create staging tables.
-- You'll notice that while there are primary keys defined,
-- there are no foreign key constraints. Depending on the
-- cleanliness of your data, you might even get rid of the
-- primary key definitions (though you'll need to add
-- some sort of processing to clean the data one way or
-- another, obviously).
--
-- and, depending context, these could even be temp tables
--
create table stage.parent
(
  id   int         not null primary key ,
  data varchar(32) not null ,
)

create table stage.child
(
  id        int         not null primary key ,
  data      varchar(32) not null ,
  parent_id int         not null ,
)

--
-- and of course, the final destination tables already exist,
-- complete with identity properties, etc.
--
create table dbo.parent
(
  id int not null identity(1,1) primary key ,
  data varchar(32) not null ,
)
create table dbo.child
(
  id int not null identity(1,1) primary key ,
  data varchar(32) not null ,
  parent_id int not null foreign key references dbo.parent(id) ,
)

-----------------------------------------------------------------------
-- so, you BCP or otherwise load your staging tables with the new data
-- frome the source tables. How this happens is left as an exercise for
-- the reader. We'll just assume that some sort of magic happens to
-- make it so. Don't forget to truncate the staging tables prior to
-- loading them with data.
-----------------------------------------------------------------------

-------------------------------------------------------------------------
-- Now we get to work to populate the production tables with the new data
--
-- First we need a map to let us create the new identity values.
-------------------------------------------------------------------------
drop table #parent_map
create table #parent_map
(
  old_id int not null primary key nonclustered       ,
  offset int not null identity(1,1) unique clustered ,
  new_id int     null ,  
)
create table #child_map
(
  old_id int not null primary key nonclustered ,
  offset int not null identity(1,1) unique clustered ,
  new_id int     null ,
)

insert #parent_map ( old_id ) select id from stage.parent
insert #child_map  ( old_id ) select id from stage.child

-------------------------------------------------------------------------------
-- now that we've got the map, we can blast the data into the production tables
-------------------------------------------------------------------------------

--
-- compute the new ID values
--
update #parent_map set new_id = offset + ( select max(id) from dbo.parent )

--
-- blast it into the parent table, turning on identity_insert
--
set identity_insert dbo.parent on

insert dbo.parent (id,data)
select id   = map.new_id   ,
       data = staging.data
from stage.parent staging
join #parent_map  map     on map.old_id = staging.id

set identity_insert dbo.parent off

--
-- reseed the identity properties high water mark
--
dbcc checkident dbo.parent , reseed


--
-- compute the new ID values
--
update #child_map set new_id = offset + ( select max(id) from dbo.child )

--
-- blast it into the child table, turning on identity_insert
--
set identity_insert dbo.child on

insert dbo.child ( id , data , parent_id )
select id        = parent.new_id   ,
       data      = staging.data    ,
       parent_id = parent.new_id

from stage.child staging
join #child_map  map      on map.old_id    = staging.id
join #parent_map parent   on parent.old_id = staging.parent_id

set identity_insert dbo.child off

--
-- reseed the identity properties high water mark
--
dbcc checkident dbo.child , reseed

------------------------------------
-- That's about all there is too it.
------------------------------------

Answer 5

我想你面臨的折衷是BulkInsert的性能與Identity的可靠性。

您可以暫時將數據庫放入SingleUserMode以執行插入嗎？

我在轉換項目中遇到了一個非常類似的問題，我在其中為非常大的表添加了一個Identity列，並且它們有子項。 幸運的是，我能夠設置父和子源的身份（我使用TextDataReader）來執行BulkInsert，並且我同時生成了父文件和子文件。

我也獲得了你所談論的性能提升，OleDBDataReader Source - > StreamWriter ...然后是TextDataReader - > SQLBulk

標識列上具有父/子關系的SqlBulkCopy和DataTables

問題描述

5 個解決方案

解決方案1
9 2009-07-31 19:54:52

解決方案2
4 2013-04-11 14:06:28

解決方案3
1 2011-02-25 20:18:26

解決方案4
1 2011-02-25 21:03:32

解決方案5
0 2009-07-29 07:44:18

標識列上具有父/子關系的SqlBulkCopy和DataTables

問題描述

5 個解決方案

解決方案1 9 2009-07-31 19:54:52

解決方案2 4 2013-04-11 14:06:28

解決方案3 1 2011-02-25 20:18:26

解決方案4 1 2011-02-25 21:03:32

解決方案5 0 2009-07-29 07:44:18

解決方案1
9 2009-07-31 19:54:52

解決方案2
4 2013-04-11 14:06:28

解決方案3
1 2011-02-25 20:18:26

解決方案4
1 2011-02-25 21:03:32

解決方案5
0 2009-07-29 07:44:18