简体繁体 English

将大量数据加载到Oracle SQL数据库

[英]Loading large amounts of data to an Oracle SQL Database

原文 2010-06-07 19:26:59 3 3 sql/ oracle/ insert/ bulkinsert/ sql-loader

I was wondering if anyone had any experience with what I am about to embark on. 我想知道是否有人对我即将开始的事情有任何经验。 I have several csv files which are all around a GB or so in size and I need to load them into a an oracle database. 我有几个csv文件大小都在GB左右，我需要将它们加载到oracle数据库中。 While most of my work after loading will be read-only I will need to load updates from time to time. 虽然我加载后的大多数工作都是只读的，但我还是需要不时加载更新。 Basically I just need a good tool for loading several rows of data at a time up to my db. 基本上我只需要一个很好的工具来一次加载几行数据直到我的数据库。

Here is what I have found so far: 这是我到目前为止所发现的：

I could use SQL Loader t do a lot of the work 我可以使用SQL Loader做很多工作
I could use Bulk-Insert commands 我可以使用批量插入命令
Some sort of batch insert. 某种批量插入。

Using prepared statement somehow might be a good idea. 以某种方式使用预备语句可能是个好主意。 I guess I was wondering what everyone thinks is the fastest way to get this insert done. 我想我想知道每个人都认为这是完成插入的最快方法。 Any tips? 有小费吗？

3 个解决方案

I would be very surprised if you could roll your own utility that will outperform SQL*Loader Direct Path Loads . 如果您可以推出自己的实用程序，它将胜过SQL * Loader Direct Path Loads，我会非常惊讶。 Oracle built this utility for exactly this purpose - the likelihood of building something more efficient is practically nil. Oracle为此目的构建了这个实用程序 - 构建更高效的东西的可能性实际上是零。 There is also the Parallel Direct Path Load , which allows you to have multiple direct path load processes running concurrently. 还有Parallel Direct Path Load ，它允许您同时运行多个直接路径加载进程。

From the manual: 从手册：

Instead of filling a bind array buffer and passing it to the Oracle database with a SQL INSERT statement, a direct path load uses the direct path API to pass the data to be loaded to the load engine in the server. 直接路径加载使用直接路径API将要加载的数据传递到服务器中的加载引擎，而不是填充绑定数组缓冲区并使用SQL INSERT语句将其传递到Oracle数据库。 The load engine builds a column array structure from the data passed to it. 加载引擎根据传递给它的数据构建列数组结构。

The direct path load engine uses the column array structure to format Oracle data blocks and build index keys. 直接路径加载引擎使用列数组结构来格式化Oracle数据块并构建索引键。 The newly formatted database blocks are written directly to the database (multiple blocks per I/O request using asynchronous writes if the host platform supports asynchronous I/O). 新格式化的数据库块直接写入数据库（如果主机平台支持异步I / O，则使用异步写入每个I / O请求多个块）。

Internally, multiple buffers are used for the formatted blocks. 在内部，多个缓冲区用于格式化的块。 While one buffer is being filled, one or more buffers are being written if asynchronous I/O is available on the host platform. 在填充一个缓冲区时，如果主机平台上有异步I / O，则正在写入一个或多个缓冲区。 Overlapping computation with I/O increases load performance. 使用I / O重叠计算可提高负载性能。

There are cases where Direct Path Load cannot be used . 有些情况下无法使用直接路径加载。

With that amount of data, you'd better be sure of your backing store - the dbf disks' free space. 有了这么多的数据，你最好确定你的后备存储 - dbf磁盘的可用空间。

sqlldr is script drive, very efficient, generally more efficient than a sql script. sqlldr是脚本驱动器，非常高效，通常比sql脚本更有效。 The only thing I wonder about is the magnitude of the data. 我唯一想知道的是数据的大小。 I personally would consider several to many sqlldr processes and assign each one a subset of data and let the processes run in parallel. 我个人会考虑几个到多个sqlldr进程，并为每个进程分配一个数据子集，让进程并行运行。

You said you wanted to load a few records at a time? 你说你想一次加载几条记录？ That may take a lot longer than you think. 这可能比你想象的要长很多。 Did you mean a few files at a time? 你一次是指几个文件吗？

You may be able to create an external table on the CSV files and load them in by SELECTing from the external table into another table. 您可以在CSV文件上创建外部表，并通过从外部表中选择另一个表来加载它们。 Whether this method will be quicker not sure however might be quicker in terms of messing around getting sql*loader to work especially when you have a criteria for UPDATEs. 然而，这种方法是否更快更不确定可能会更快地使得sql *加载器工作，特别是当你有UPDATE标准时。