简体   繁体   English

在JDBC批处理作业中使用多线程

[英]Using multi-threading in JDBC batch job

We have a JDBC batch job. 我们有一个JDBC批处理作业。 There are two tables: 有两个表:

  • BUSINESS_CONTRACT BUSINESS_CONTRACT
  • CLASSIFY_RECORD CLASSIFY_RECORD

The table BUSINESS_CONTRACT stores information of business contracts, we classify business contracts every month and store classify result in the table CLASSIFY_RECORD. 表BUSINESS_CONTRACT存储业务合同的信息,我们每月对业务合同进行分类,并将分类结果存储在表CLASSIFY_RECORD中。

The batch job runs once per month, query the BUSINESS_CONTRACT for those business contracts need to be classified and classify them then insert classify results into CLASSIFY_RECORD. 批处理作业每月运行一次,查询BUSINESS_CONTRACT以查找需要分类的业务合同并对其进行分类,然后将分类结果插入CLASSIFY_RECORD。

The batch job runs in a single thread right now, and I want to make it runs with multi-threads 批处理作业现在在单个线程中运行,我想让它以多线程运行

How should I write the basic code structure using the dispatcher-worker pattern? 我应该如何使用dispatcher-worker模式编写基本代码结构?

I learn java multi-threading, but found theoretical resources mostly.Now I want to use multi-threading to solve a real problem, but don't know how to write the first line code. 我学习java多线程,但大多发现了理论资源。现在我想用多线程来解决一个真正的问题,但不知道如何编写第一行代码。

First, do you need the added complexity of multi-threading? 首先,您是否需要增加多线程的复杂性? How long does your current process take to run? 您当前的流程需要多长时间才能运行? Do you have multiple CPUs or multiple CPU cores available on the server you would be running this on, that would make the multi-threading beneficial? 您在运行此服务器的服务器上是否有多个CPU或多个CPU核心,这会使多线程有益吗?

I'm not going to write your code for you, but can give you a few pointers... 我不会为你编写代码,但可以给你一些指示......

How would you do this work manually? 你会如何手动完成这项工作? Assume you had these as paper records, and had to split the task with a co-worker. 假设你有这些作为纸质记录,并且不得不与同事分开任务。 How would you divide up the work? 你会如何分工? Between 2 people or 20 people? 2人或20人? (That's how many threads you could potentially split this into.) (这就是你可以分成多少个线程。)

Once you have these details figured out, you can create multiple threads (your workers, using parent "dispatcher" code) - each configured to select only a portion of the results from your query. 找到这些详细信息后,您可以创建多个线程(您的工作人员,使用父“调度程序”代码) - 每个线程都配置为仅从查询中选择部分结果。 You should keep references to each of your threads, and call .join() on each of them once they are all started in order to wait for the entire batch to complete. 您应该保留对每个线程的引用,并在它们全部启动后在每个线程上调用.join() ,以便等待整个批处理完成。 If there is a large amount of data that will be difficult to split into equal units of work (1,000 records divided into 500 and 500 may require 75% and 25% of the resources for whatever reason), you may want to consider splitting the work into much smaller units (more units than threads), then have the dispatcher continue to feed the units of work to the workers until all work has been assigned. 如果有大量数据难以分成相同的工作单元(分为500和500的1000条记录可能因任何原因需要75%和25%的资源),您可能需要考虑拆分工作对于更小的单位(比线程更多的单位),然后让调度员继续向工人提供工作单元,直到完成所有工作。

Also consider, would these split functions of work be truly distinct? 还要考虑,这些分开的工作功能是否真的与众不同? If one unit of work fails for some reason and needs to be rolled-back in the database, does this mean that all of the other units of work need to be stopped and any existing inserts rolled-back as well? 如果一个工作单元由于某种原因而失败并需要在数据库中回滚,这是否意味着所有其他工作单元都需要停止并且任何现有的插件也需要回滚?

Are you using batch updates? 你在使用批量更新吗? It will probably make more of a difference than multiple threads doing single updates. 与单个更新的多个线程相比,它可能会产生更大的差异。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM