简体   繁体   English

处理海量数据

[英]processing huge data volume

I am working on an interface which does DB interaction on some system. 我正在研究在某些系统上进行数据库交互的接口。 As part of my work, I am supposed to query the source db, invoke some procedure, get the data in a reference cursor and populate the destination db. 作为工作的一部分,我应该查询源数据库,调用一些过程,在参考游标中获取数据,然后填充目标数据库。

As the data volume can be huge, I am using multi threading on the destination db to invoke the procedure. 由于数据量可能很大,因此我在目标数据库上使用多线程来调用该过程。 For ex, if the total number of entries that are supposed to loaded is 1 million, then on destination db, the procedure is invoked say 10 times with 100K records each. 例如,如果应该加载的条目总数为100万,则在目标db上,该过程被调用10次,每次记录100K条记录。

This arrangement is working fine except when the data volume at the source db is huge (for ex more than 2 million entries). 除了当源db上的数据量巨大时(例如,超过200万个条目)时,这种安排工作良好。 I have set around 20 GB of heap space for processing the record but my program is failing with heap memory error. 我已经设置了大约20 GB的堆空间用于处理记录,但是我的程序由于堆内存错误而失败。

I want to know if there is a way to query the data from the source db in parallel mode (for ex, assuming, a total of 2 million records is fetched from the source stored procedure, my program should first fetch a subset of this record and then move on to next or something like that). 我想知道是否有一种以并行方式从源db查询数据的方法(例如,假设从源存储过程中获取了总共200万条记录,我的程序应首先获取该记录的子集然后转到下一个或类似的内容)。

One of the solutions that I have proposed is to send the records in this manner though db side but I want to know if there is a better alternative. 我提出的解决方案之一是通过db端以这种方式发送记录,但是我想知道是否有更好的选择。 Please suggest 请建议

I found a solution to this. 我找到了解决方案。 The BeanPropertyRowMapper class in Spring API needs to be extended and you need to override the mapRow method. Spring API中的BeanPropertyRowMapper类需要扩展,并且您需要重写mapRow方法。 The mapRow method is called when the data is ready to be fetched. 准备获取数据时将调用mapRow方法。 You can apply some kind of batching mechanism at this stage. 您可以在此阶段应用某种批处理机制。 Please note that I posted the question as the data is fetched using the Stored procedure and the output comes in the form of a reference cursor. 请注意,我发布了问题,因为使用存储过程获取了数据,并且输出以参考游标的形式出现。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM