简体   繁体   English

使用Spring Boot从Oracle到SQL Server的数据传输

[英]Oracle to SQL Server data transfer using spring boot

I am looking for technical solution; 我正在寻找技术解决方案; To query data from one db and load it into a SQL Server database using java spring boot. 使用Java Spring Boot从一个数据库查询数据并将其加载到SQL Server数据库中。

Mock query to get productNames which are updated between given time of 20 hours: 模拟查询以获取在20小时的给定时间内更新的productName:

SELECT 
    productName, updatedtime FROM
products WHERE
    updatedtime BETWEEN '2018-03-26 00:00:01' AND '2018-03-26 19:59:59';

Here is the approach we followed. 这是我们遵循的方法。

1) Its long running Oracle query, which runs approximately 1 hours on business hours and it returns ~1Million records. 1)它运行时间长的Oracle查询,在工作时间大约运行1个小时,它返回约100万条记录。

2) We have to insert/ dump this resultset into a SQL Server Table using JDBC. 2)我们必须使用JDBC将结果集插入/转储到SQL Server表中。

3) As I know Oracle JDBC driver supports kind of streaming. 3)据我所知,Oracle JDBC驱动程序支持某种流。 When we iterate over ResultSet it loads only fetchSize rows into memory. 当我们遍历ResultSet时,它仅将fetchSize行加载到内存中。

int currentRow = 1;
while (rs.next()) {
  // get your data from current row from Oracle database and accumulate in a batch
  if (currentRow++ % BATCH_SIZE == 0) {
    //insert whole accumulated batch into SqlServer database
  }
}

In this case we do not need to store all huge dataset from Oracle in memory. 在这种情况下,我们不需要将来自Oracle的所有巨大数据集存储在内存中。 And we will insert into SqlServer by batches of BATCH_SIZE. 然后,将按批处理BATCH_SIZE插入SqlServer。 The only thing is that we need to think where to do commit into SqlServer database. 唯一的事情是,我们需要考虑在哪里提交到SqlServer数据库。

4)Here is the bottleneck is query execution waiting time to get the data from oracle db, So I am planing to split the query into 10 equal parts such each query to give updatedtime between each hour as shown. 4)这是从Oracle db获取数据的查询执行等待时间的瓶颈,因此我打算将查询分为10个相等的部分,例如每个查询,以给出每个小时之间的更新时间。 so that execution time also get reduced to ~10min for each query. 这样每个查询的执行时间也减少到了10分钟左右。 eg: SELECT productName, updatedtime FROM products WHERE updatedtime BETWEEN '2018-03-26 01:00:01' AND '2018-03-26 01:59:59'; 例如:SELECT productName,updatedtime从'2018-03-26 01:00:01'和'2018-03-26 01:59:59'之间更新时间的产品中;

5.For that I required 5 Oracle JDBC connections and 5 Sql server connection(to query the data and insert into db) to do its job independently. 5.为此,我需要5个Oracle JDBC连接和5个Sql服务器连接(以查询数据并将其插入db)以独立完成其工作。 I am new to JDBC connection pooling How can I do the connection pooling and closing the connection if its not in use etc? 我是JDBC连接池的新手,如果不使用连接池,如何进行连接池和关闭连接?

Please suggest if you have any other better approach to get the data from the data source quickly as real time data. 请建议您是否还有其他更好的方法可以从数据源中快速获取实时数据。 Please suggest. 请提出建议。 Thanks in advance. 提前致谢。

This is a typical use case from spring batch. 这是春季批处理中的典型用例。

There you have the concept of ItemReader(from your source db) and ItemWriter(into your destination db). 在那里,您具有ItemReader(来自源数据库)和ItemWriter(进入目标数据库)的概念。

You can define multiple datasource and you will have capabilities for reading in fixed fetch size(JdbcCursorItemReader for instance) and also to create grid for parallel execution. 您可以定义多个数据源,并且将具有读取固定读取大小(例如,JdbcCursorItemReader)的功能以及创建用于并行执行的网格的功能。

With a quick search you can find many examples online relative to this kind of tasks. 通过快速搜索,您可以在线找到许多与此类任务相关的示例。

I know I'm not posting the code relative to the concept but it will take me some time to prepare a decent example 我知道我没有发布与该概念相关的代码,但是要花一些时间来准备一个不错的示例

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM