简体   繁体   English

具有多线程和公共资源的批处理的建议体系结构

[英]Suggested Architecture for a batch with multi-threading and common resources

I need to write a batch in Java that using multiple threads perform various operation on a bunch of data. 我需要用Java编写一个批处理,该批处理使用多个线程对一堆数据执行各种操作。
I got almost 60k rows of data, and need to do different operations on them. 我得到了将近6万行数据,并且需要对它们执行不同的操作。 Some of them works on the same data but using different outputs. 其中一些使用相同的数据,但使用不同的输出。
So, the question is: is it right to create this big 60k-length ArrayList and pass it through the various operator, so they can add each one their output, or there is a better Architecture Design that someone can suggest me? 因此,问题是:创建这个60k长的大型ArrayList并通过各种运算符传递它,以便他们可以将每个输出添加到自己的输出,还是有人可以向我建议的更好的体系结构设计,对吗?

EDIT: 编辑:
I need to create these objects: 我需要创建这些对象:

MyObject, with an ArrayList of MyObject2, 3 different Integers, 2 Strings. MyObject,具有MyObject2的ArrayList,3个不同的Integer,2个字符串。 MyObject2, with 12 floats MyBigObject, with an ArrayList of MyObjectof usually of 60k elements, and some Strings. MyObject2,具有12个浮点数MyBigObject,具有通常包含60k个元素的MyObjectof的ArrayList和一些Strings。

My different operators works on the same ArrayList of MyObject2, but outputs on the integers, so for example Operators1 fetch from ArrayList of MyObject2, perform some calculation and output its result on MyObject.Integer1, Operators2 fetch from ArrayList of MyObject2, perform some different calculation and output its result on MyObject.Integer2, and so on. 我的不同运算符在MyObject2的同一ArrayList上工作,但是在整数上输出,因此例如从MyObject2的ArrayList提取Operators1,执行一些计算并将结果输出到MyObject.Integer1,从MyObject2的ArrayList提取Operators2,执行一些不同的计算并将其结果输出到MyObject.Integer2,依此类推。

Is this architecture "safe"? 这个架构“安全”吗? The ArrayList of MyObject2 has to be read only, never edited from any operator. MyObject2的ArrayList必须是只读的,切勿从任何运算符进行编辑。

EDIT: Actually I don't have still code because I'm studying the architecture before, and then I'll start writing something. 编辑:实际上,我还没有编码,因为我之前在研究体系结构,然后开始写一些东西。
Trying to rephrase my question: 试图改写我的问题:

Is it ok, in a Batch written in pure Java (without any Framework, I'm not using for example Spring Batch because it will be like shooting a fly with a shotgun for my project), to create a macro object, pass it around so that every different thread can read from the same datas but output their results on different datas? 可以吗,在用纯Java编写的批处理中(没有任何框架,我不使用例如Spring Batch,因为这就像为我的项目用a弹枪射击苍蝇一样),创建宏对象,将其传递给周围这样,每个不同的线程都可以读取相同的数据,但是将结果输出到不同的数据上? Can it be dangerous if different threads reads from the same data at the same time? 如果不同的线程同时从同一数据中读取数据,会很危险吗?

It depends on your operations. 这取决于您的操作。

Generally it's possible to partition work on a dataset horizontally or vertically. 通常,可以在水平或垂直方向上对数据集进行分区。

Horizontally means splitting your dataset into several smaller sets let each individual thread handle such a set. 水平表示将数据集分成几个较小的集合,让每个单独的线程处理这样的集合。 This code is safest yet usually slower because each individual thread will do several different operations. 这段代码最安全,但通常会更慢,因为每个单独的线程都会执行几个不同的操作。 It's also a bit more complex to reason about for the same reason. 出于相同的原因,进行推理也更加复杂。

Vertically means each thread performs some operation on a specific "field" or "column" or whatever individual data units is in your data set. 纵向表示每个线程对特定的“字段”或“列”或数据集中的任何单个数据单元执行某些操作。 This is generally easier to implement (each thread does one thing on the whole set) and can be faster. 通常,这更容易实现(每个线程在整个集合上执行一项操作)并且可以更快。 However each operation on the dataset needs to be independent of your other operations. 但是,数据集上的每个操作都必须独立于其他操作。 If you are unsure about multi-threading in general, I recommend doing work horizontally in parallel. 如果您总体上不确定多线程,建议您并行进行水平工作。

Now to the question about whether is ok to pass your full dataset around (some ArrayList), sure it is! 现在是关于是否可以将整个数据集传递给某些问题的问题(某些ArrayList),确定可以! It's just a reference and won't really matter. 这只是参考,并不重要。 What matters are the operations you perform on the dataset. 重要的是您对数据集执行的操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM