[英]What's the most efficient way to bulk-copy to SQL Server from Java?
I have data that is streamed from disk and processed in memory by a Java application and that finally needs to be copied into SQL Server. 我有从磁盘流式传输并由Java应用程序在内存中处理的数据,最终需要将其复制到SQL Server中。 The data can be fairly large (hence the streaming) and can require up to several 100,000 rows to be inserted. 数据可能非常大(因此会进行流式传输),并且可能需要最多插入100,000行。 The fastest solution seems to be using SQL Server's bulk-copy feature. 最快的解决方案似乎是使用SQL Server的批量复制功能。 However, I haven't found any way for Java programs to do this easily or nearly fast enough. 但是,我还没有找到让Java程序轻松或几乎足够快地执行此操作的方法。
Here are some ways that I've already investigated: 这是我已经研究过的一些方法:
Using the SqlBulkCopy class in .NET. 在.NET中使用SqlBulkCopy类。 This is very efficient since you can stream data right from a data source and straight to SQL Server. 这非常有效,因为您可以直接从数据源流式传输数据,然后直接将数据流传输到SQL Server。 The problem with this approach is that you need to be running .NET. 这种方法的问题是您需要运行.NET。 Perhaps this could be used using a Java to .NET bridge. 也许可以使用Java到.NET桥来使用。 Although, I wonder about the cost of marshalling data between runtimes. 虽然,我想知道在运行时之间编组数据的成本。
Using the BULK INSERT TSQL statement. 使用BULK INSERT TSQL语句。 The problem with this is that you need create a properly formatted file on disk. 问题是您需要在磁盘上创建格式正确的文件。 I've seen some small performance gains over JDBC's batch insert using this. 我已经看到使用JDBC的批处理插入可以获得一些小的性能提升。 Also, this is only useful locally. 此外,这仅在本地有用。
Write files to disk and use the bcp command line utility. 将文件写入磁盘,然后使用bcp命令行实用程序。 Still a little faster than JDBC batch insert but not that much. 仍然比JDBC批处理插入快一点,但不算多。 I also lose the ability to use a transaction with this method. 我也失去了使用这种方法进行交易的能力。
Use the C API . 使用C API 。 Again, very efficient, but you need to be using C. There would be a way to use this through JNI. 同样,这非常有效,但是您需要使用C。将有一种方法可以通过JNI使用它。 If there's some free Java library out there that does this, I'd like to know about it. 如果有一些免费的Java库可以执行此操作,那么我想了解一下。
I'm looking for the fastest solution. 我正在寻找最快的解决方案。 Memory is not an issue. 内存不是问题。
Thanks! 谢谢!
The best option for me was to use the commercial SQL Server JDBC driver from DataDirect with standard JDBC calls addBatch/executeBatch that run across Linux and Windows - https://blogs.datadirect.com/2012/05/how-to-bulk-insert-jdbc-batches-into-microsoft-sql-server-oracle-sybase.html 对我来说,最好的选择是从DataDirect的标准JDBC使用商用的SQL Server JDBC驱动程序调用addBatch /则ExecuteBatch跨Linux和Windows上运行- https://blogs.datadirect.com/2012/05/how-to-bulk-将jdbc-batches插入到Microsoft-sql-server-oracle-sybase.html中
I've seen load times improve from 7 hours to under 30 minutes. 我已经看到加载时间从7小时缩短到30分钟以下。
从SQL Server的Microsoft JDBC驱动程序的4.2版开始,有一个名为com.microsoft.sqlserver.jdbc.SQLServerBulkCopy
的类,它与.NET的SqlBulkCopy
类相同。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.