简体   繁体   English

将包含数百万条记录的表从一个数据库复制到另一个数据库 - Spring Boot + Spring JDBC

[英]Copy table having millions of records from one database to another - Spring Boot + Spring JDBC

Working on a small example where have to copy millions of records from teradata database to Oracle DB. 处理一个小例子,必须将数百万条记录从teradata数据库复制到Oracle DB。

Environment: Spring Boot + Spring JDBC (jdbcTemplate) + Spring REST + Spring Scheduler + Maven + Oracle + Teradata 环境:Spring Boot + Spring JDBC(jdbcTemplate)+ Spring REST + Spring Scheduler + Maven + Oracle + Teradata

Using Spring JDBC's batchUpdate to insert data into target Database Oracle. 使用Spring JDBC的batchUpdate将数据插入目标数据库Oracle。

Using teradata's 'top 1000' in SQL query from source database. 在源数据库的SQL查询中使用teradata的“前1000名”。

fecthDataResults = repository.queryForList(
                "select top 1000 A, B, C, D, E from " + schemaName + ".V_X");

Querying from a View "V_X". 从视图“V_X”查询。

This View has 40 million records and spring boot application will choke if it runs. 这个视图有4000万条记录,如果它运行,春季启动应用程序将会阻塞。

Also inserting into 2 tables (Primary and Backup) in target Oracle DB. 还插入目标Oracle DB中的2个表(主表和备份表)。

Whats the best way to fetch and load/copy 40 million records making sure that copying was done successfully into 2 tables. 什么是获取和加载/复制4000万条记录的最佳方法,确保将复制成功完成到2个表中。

Spring Scheduler to schedule the batch copy at specified time/interval. Spring Scheduler以指定的时间/间隔计划批量复制。 Spring Rest to invoke copying manually - both of which is achieved. Spring Rest可以手动调用复制 - 两者都可以实现。

Any suggestions would be appreciated. 任何建议,将不胜感激。

Thanks. 谢谢。

There are different ways you can solution this: 您可以通过不同的方式解决此问题:

  1. Logstash Approach - Specify your source and destination data and load the data to both the destination DBs. Logstash方法 - 指定源和目标数据,并将数据加载到目标DB。 It has cron support and the logstash can run based on the schedule. 它具有cron支持,并且logstash可以根据计划运行。 It is quite faster. 它要快得多。 You can specify how many rows you wanna fetch every time. 您可以指定每次要获取的行数。

  2. Use an ETL tool. 使用ETL工具。 You can go with any of the open source versions if you do have the ecosystem in place. 如果您确实拥有生态系统,则可以使用任何开源版本。 Talend is a good candidate where you can design your job and export as runnable Jar. Talend是一个很好的候选人,你可以设计你的工作并导出为可运行的Jar。 You can schedule this by using any component of your choice. 您可以使用您选择的任何组件来安排此操作。

  3. Spring Batch. 春批。 Please refer this question. 请参考这个问题。 Spring RESTful web services - High volume data processing Spring RESTful Web服务 - 高容量数据处理

  4. Spring Cloud Data Flow or Spring boot with a MQ as an intermediate store between your datasources. Spring云数据流或Spring引导,MQ作为数据源之间的中间存储。 You may have to introduce message queues to handle failover, fallback mechanisms. 您可能必须引入消息队列来处理故障转移,回退机制。 Highly reliable and can implemented in a async manner. 高度可靠,可以以异步方式实现。

My personal opinion is to go with Logstash. 我的个人意见是与Logstash一起使用。 If you feel any of the above solutions make sense. 如果您认为上述任何解决方案都有意义。 I can elaborate them if you want. 如果你愿意,我可以详细说明。

well in base of the information you give and following the chosen stack, in my opinion you have two possibilities, first create a project with spring batch in addition with spring batch admin or spring integration to deal with rest. 在您提供的信息的基础上以及所选择的堆栈之后,在我看来,您有两种可能性,首先创建一个带有spring批处理的项目,另外还有spring batch admin或spring integration来处理休息。 The second in fact uses the first solution in a big data concept using Spring XD I recommend you use a profiler to speed up the performance as much as you could. 第二个实际上是使用Spring XD在大数据概念中使用第一个解决方案我建议您使用分析器尽可能地加快性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM