简体   繁体   English

Java:海量数据上的多线程:线程之间共享数据?

[英]Java: multithreading on massive data: sharing data between threads?

I want to run a multithreaded program on massive data. 我想对海量数据运行多线程程序。 I usually create a class which is callable (or runnable) and pass the data needed for the process to the class. 我通常创建一个可调用(或可运行)的类,并将过程所需的数据传递给该类。

public class CallableTrainer implements Callable<PredictorResult> {

   dataType data;

   CallableTrainer( dataType massiveData ) { 
       this.data = massiveData;
   }

   @Override
   public PredictorResult call() throws Exception {
        // do something and return ... 
   }
}

Based on the above implementation, I assume that the 'massiveData' is always copied for each thread (right?) If this is true, I am wasting lots of memory by copying this data for each thread. 基于上述实现,我假设始终为每个线程复制“ massiveData”(对吗?),如果是这样,我通过为每个线程复制该数据来浪费大量内存。 Is there any way to share the data between threads? 有什么方法可以在线程之间共享数据吗?

I assume that the 'massiveData' is always copied for each thread (right?) If this is true ... 我假设总是为每个线程复制“ massiveData”(对吗?),如果这是真的...

Nope, false. 不,是的。 Only the reference to massiveData is copied. 仅复制对massiveData的引用。

Java doesn't do magic copies of non-primitive types. Java不会对非原始类型进行魔术复制。 If you want to copy something you have to do it explicitly. 如果要复制某些内容,则必须显式地进行。

If you didn't already know that, I'm guessing you're going to run into all sorts of other problems when you write this multi-threaded code. 如果您还不知道这一点,那么我猜您在编写此多线程代码时会遇到各种各样的其他问题。 For example, unless these threads are only reading massiveData , then you really need some sort of synchronization or atomicity guarantees on any updates you make, otherwise you're going to end up with garbage. 例如,除非这些线程仅读取 massiveData ,否则您确实需要对所做的任何更新进行某种同步或原子性保证,否则最终将导致垃圾回收。

Here's a good book on the topic (with Java examples): The Art of Multiprocessor Programming 这是一本有关该主题的好书(带有Java示例): 多处理器编程的艺术

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM