简体   繁体   English

在多个相关表之间批量插入?

[英]BULK INSERT across multiple related tables?

I need to do a BULK INSERT of several hundred-thousand records across 3 tables. 我需要在3个表中进行数十万条记录的大容量插入。 A simple breakdown of the tables would be: 这些表的简单分类如下:

TableA
--------
TableAID (PK)
TableBID (FK)
TableCID (FK)
Other Columns

TableB
--------
TableBID (PK)
Other Columns

TableC
--------
TableCID (PK)
Other Columns

The problem with a bulk insert, of course, is that it only works with one table so FK's become a problem. 当然,批量插入的问题在于它只能在一张桌子上使用,所以FK成为一个问题。

I've been looking around for ways to work around this, and from what I've gleaned from various sources, using a SEQUENCE column might be the best bet. 我一直在寻找解决此问题的方法,从我从各种来源收集到的信息来看,使用SEQUENCE列可能是最好的选择。 I just want to make sure I have correctly cobbled together the logic from the various threads and posts I've read on this. 我只想确保我已将我阅读过的各种线程和帖子中的逻辑正确地拼凑在一起。 Let me know if I have the right idea. 让我知道我是否有正确的想法。

First, would modify the tables to look like this: 首先,将表修改为如下形式:

TableA
--------
TableAID (PK)
TableBSequence
TableCSequence
Other Columns

TableB
--------
TableBID (PK)
TableBSequence
Other Columns

TableC
--------
TableCID (PK)
TableCSequence
Other Columns

Then, from within the application code, I would make five calls to the database with the following logic: 然后,从应用程序代码中,我将使用以下逻辑对数据库进行五个调用:

  • Request X Sequence numbers from TableC, where X is the known number of records to be inserted into TableC. 向TableC请求X序列号,其中X是要插入到TableC中的已知记录数。 (1st DB call.) (第一个数据库调用。)

  • Request Y Sequence numbers from TableB, where Y is the known number of records to be inserted into TableB (2nd DB call.) 向TableB请求Y序列号,其中Y是要插入到TableB中的已知记录数(第二个DB调用)。

  • Modify the existing objects for A, B and C (which are models generated to mirror the tables) with the now known Sequence numbers. 用现在已知的序列号修改A,B和C的现有对象(它们是为反映表而生成的模型)。

  • Bulk insert to TableA. 批量插入TableA。 (3rd DB call) (第3个数据库调用)

  • Bulk insert to TableB. 批量插入到TableB。 (4th DB call) (第4个数据库调用)
  • Bulk insert to TableC. 批量插入TableC。 (5th DB call) (第5个数据库调用)

And then, of course, we would always join on the Sequence. 然后,当然,我们总是会加入序列。

I have three questions: 我有三个问题:

  1. Do I have the basic logic correct? 我的基本逻辑正确吗?

  2. In Tables B and C, would I remove the clustered index from the PK and put in on the Sequence instead? 在表B和C中,我是否可以从PK中删除聚簇索引并改为放在Sequence中?

  3. Once the Sequence numbers are requested from Tables B and C, are they then somehow locked between the request and the bulk insert? 从表B和C请求序列号后,它们是否会以某种方式锁定在请求和批量插入之间? I just need to make sure that between the request and the insert, some other process doesn't request and use the same numbers. 我只需要确保在请求和插入之间,其他一些过程不会请求并使用相同的数字。

Thanks! 谢谢!

EDIT: 编辑:

After typing this up and posting it, I've been reading deeper into the SEQUENCE document. 键入并发布后,我一直在深入阅读SEQUENCE文档。 I think I misunderstood it at first. 我想我一开始误解了。 SEQUENCE is not a column type. SEQUENCE不是列类型。 For the actual column in the table, I would just use an INT (or maybe a BIGINT) depending on the number of records I expect to have). 对于表中的实际列,我只需要使用INT(或者可能是BIGINT),具体取决于我希望拥有的记录数。 The actual SEQUENCE object is an entirely separate entity whose job is to generate numeric values on request and keep track of which ones have already been generated. 实际的SEQUENCE对象是一个完全独立的实体,其任务是根据请求生成数值并跟踪已经生成的数值。 So, if I understand correctly, I would generate two SEQUENCE objects, one to be used in conjunction with Table B and one with Table C. 因此,如果我理解正确,我将生成两个SEQUENCE对象,一个与表B结合使用,一个与表C结合使用。

So that answers my third question. 这样就回答了我的第三个问题。

Do I have the basic logic correct? 我的基本逻辑正确吗?

Yes. 是。 The other common approach here is to bulk load your data into a staging table, and do something similar on the server-side. 另一个常见的方法是将数据批量加载到临时表中,然后在服务器端执行类似的操作。

From the client you can request ranges of sequence values using the sp_sequence_get_range stored procedure. 您可以从客户端使用sp_sequence_get_range存储过程请求序列值的范围。

In Tables B and C, would I remove the clustered index from the PK 在表B和C中,我是否要从PK中删除聚集索引

No, as you later noted the sequence just supplies the PK values for you. 不,正如您稍后提到的,该序列仅为您提供PK值。

Sorry, read your question wrong at first. 抱歉,刚读错您的问题。 I see now that you are trying to generate your own PK's rather then allow MS SQL to generate them for you. 我现在看到您正在尝试生成自己的PK,而不是允许MS SQL为您生成它们。 Scratch my above comment. 刮擦我的上述评论。

As David Browne mentioned, you might want to use a staging table to avoid the strain you'll put on your app's heap. 正如David Browne提到的那样,您可能希望使用登台表来避免对应用程序堆造成的负担。 Use tempdb and do the modifications directly on the table using a single transaction for each table. 使用tempdb并使用每个表的单个事务直接在表上进行修改。 Then, copy the staging tables over to their target or use a MERGE if appending. 然后,将登台表复制到其目标,或者在追加时使用MERGE。 If you are enforcing FK's, you can temporarily remove those constraints if you choose to insert in reverse order (C=>B=>A). 如果要强制执行FK,如果选择以相反的顺序插入(C => B => A),则可以暂时删除这些约束。 You also may want to consider temporarily removing indexes if experiencing performance issues during the insert. 如果在插入过程中遇到性能问题,您可能还需要考虑临时删除索引。 Last, consider using SSIS instead of a custom app. 最后,考虑使用SSIS而不是自定义应用程序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM