简体   繁体   English

对Java在多处理期间如何共享变量感到困惑

[英]Confused on how Java shares variables during multiprocessing

I just started using java so sorry if this question's answer is obvious. 我刚刚开始使用Java,因此很抱歉,如果这个问题的答案很明显。 I can't really figure out how to share variables in java. 我真的不知道如何在Java中共享变量。 I have been playing around with python and wanted to try to port some code over to Java to learn the langauge a bit better. 我一直在玩python,想尝试将一些代码移植到Java上以更好地学习语言。 Alot of my code is ported but I'm unsure how exactly multiprocessing and sharing of variables works in Java(my process is not disk bound, and uses alot of cpu and searching of a list). 我的代码很多,但是我不确定在Java中变量的多处理和共享是如何工作的(我的进程没有磁盘绑定,并且使用很多cpu和列表搜索)。

In Python, I can do this: 在Python中,我可以这样做:

from multiprocessing import Pool, Manager
manager = Manager()
shared_list = manager.list()
pool = Pool(process=4) 
for variables_to_send in list_of_data_to_process:
       pool.apply_async(function_or_class, (variables_to_send, shared_list))
pool.close()
pool.join()

I've been having a bit of trouble figuring out how to do multiprocessing and sharing like this in Java. 我在弄清楚如何在Java中像这样进行多处理和共享时遇到了一些麻烦。 This question helped me understand a bit(via the code) how implementing runnable can help and I'm starting to think java might automatically multiprocess threads(correct me if I'm wrong on this I read that once threads exceed capacity of a cpu they are moved to another cpu? The oracle docs seem to be more focused on threads than multiprocessing). 这个问题帮助我(通过代码)了解了如何实现runnable可以提供帮助,并且我开始认为Java可能会自动对多线程进行处理(如果我错了,请更正我,我读到一旦线程超出了CPU的容量,它们被转移到另一个CPU?甲骨文文档似乎更关注线程而不是多处理)。 But it doesn't explain how to share lists or other variables between proceses(and keep them in close enough sync). 但这并未说明如何在过程之间共享列表或其他变量(并使它们保持足够接近的同步)。

Any suggestions or resources? 有什么建议或资源吗? I am hoping I'm searching for the wrong thing(multiprocessing java) and that this is hopefully as easy(or similarly straightforward) as it is in my above code. 我希望我正在寻找错误的东西(java多处理),并且希望它与我上面的代码一样简单(或类似简单)。

Thanks! 谢谢!

There is an important difference between a thread and a process, and you are running into it now: with some exceptions, threads share memory, but processes do not . 线程和进程之间有一个重要的区别,您现在正在运行它:除了某些例外, 线程共享内存,而进程不共享内存

Note that real operating systems have ways around just about everything I'm about to say, but these features aren't used in the typical case. 请注意,真正的操作系统几乎可以满足我要说的一切,但是在典型情况下并没有使用这些功能。 So, to fire up a new process, you must clone the current process in some way with a system call (on *nix, this is fork() ), and then replace the code, stack, command-line arguments, etc. of the child process with another system call (on *nix, this is the exec() family of system calls). 因此,要启动一个新进程,必须使用系统调用(在* nix上,这是fork() )以某种方式克隆当前进程,然后替换其中的代码,堆栈,命令行参数等。子进程与另一个系统调用(在* nix上,这是exec()系列系统调用)。 Windows has rough equivalents of both these system calls, so everything I'm saying is cross-platform. Windows具有这两个系统调用的大致等效项,因此我要说的都是跨平台的。 Also, the Java Runtime Environment takes care of all these system calls under the covers, and without JNI or some other interop technology you can't really execute them yourself. 而且,Java运行时环境在幕后负责所有这些系统调用,没有JNI或其他互操作技术,您将无法真正执行它们。

There are two important things to note about this model: the child process doesn't share the address space of the parent process, and the entire address space of the child process gets replaced on the exec() call. 关于此模型,有两点需要注意:子进程不共享父进程的地址空间,并且子进程的整个地址空间在exec()调用中被替换。 So, variables in the parent process are unavailable to the child process, and vice versa. 因此,父进程中的变量对子进程不可用,反之亦然。

The thread model is quite different. 线程模型完全不同。 Threads are kind of like lite processes, in that each thread has its own instruction pointer, and (on most systems) threads are scheduled by the operating system scheduler. 线程有点像精简进程,因为每个线程都有自己的指令指针,并且(在大多数系统上)线程是由操作系统调度程序调度的。 However, a thread is a part of a process. 但是,线程是进程的一部分。 Each process has at least one thread, and all the threads in the process share memory. 每个进程至少有一个线程,并且进程中的所有线程共享内存。

Now to your problem: 现在解决您的问题:

The Python multiprocessing module spawns processes with very little effort, as your code example shows. 如代码示例所示,Python的多处理模块只需很少的努力即可生成进程。 In Java, spawning a new process takes a little more work. 在Java中,产生一个新进程需要更多的工作。 It involves creating a new Process object using ProcessBuilder.start() or Runtime.exec() . 它涉及使用ProcessBuilder.start()Runtime.exec()创建一个新的Process对象。 Then, you can pipe strings to the child process, get back its output, wait for it to exit, and a few other communication primitives. 然后,您可以将字符串传递给子进程,获取其输出,等待其退出,以及其他一些通信原语。 I would recommend writing one program to act as the coordinator and fire up each of the child processes, and writing a worker program that roughly corresponds to function_or_class in your example. 我建议编写一个程序来充当协调器并启动每个子进程,并编写一个与您的示例中的function_or_class大致对应的工作程序。 The coordinator can open multiple copies of the worker program, give each a task, and wait for all the workers to finish. 协调器可以打开工作程序的多个副本,为每个任务分配一个任务,然后等待所有工作程序完成。

You can use Java Thread for this purpose. 您可以为此目的使用Java线程。 You need to create one user defined class. 您需要创建一个用户定义的类。 That class should have setter method through which you can set shared_list object. 该类应该具有setter方法,通过该方法可以设置shared_list对象。 Implement Runnable interface and perform processing task in run() method. 在Run()方法中实现Runnable接口并执行处理任务。 You can find good example on internet. 您可以在互联网上找到很好的例子。 If you are sharing the same instance of shared_list then you need to make sure that access to this variable is synchronized. 如果共享相同的shared_list实例,则需要确保对该变量的访问已同步。

This is not the easiest way to work with threads in java but its the closed to the python code you posted. 这不是在Java中使用线程的最简单方法,但是对您发布的python代码不开放。 The task class is an instance of the callable interface and it has a call method. 任务类是可调用接口的实例,并且具有调用方法。 When we create each of the 10000 Task instances we pass them a reference to the same list. 当我们创建10000个Task实例中的每个实例时,我们将它们传递给同一列表的引用。 So when the call method of all those objects is called they will use the same list. 因此,当所有这些对象的调用方法都被调用时,它们将使用相同的列表。

We are using a fixed size thread pool of 4 threads here so all the tasks we are submitting get queued and wait for a thread to be available. 我们在这里使用4个线程的固定大小的线程池,因此我们提交的所有任务都将排队,并等待一个线程可用。

public class SharedListRunner {
    public void RunList() {
        ExecutorService executerService = Executors.newFixedThreadPool(4);
        List<String> sharedList = new List<String>();
        sharedList.add("Hello");
        for(int i=0; i < 10000; i++)
            executerService.submit(new Task(list));
    }
}

public class Task implements Callable<String> {

    List<String> sharedList;    

    public Task(List<String> sharedList) {
            this.sharedList = sharedList;
    }

    @Override
    public String call() throws Exception {
            //Do something to shared list
            sharedList.size();  
            return "World";
    }
}

At any one time 4 threads are accessing the list. 任何时候都有4个线程正在访问该列表。 If you want to dig further 4 Java threads are accessing the list, There are probably fewer OS threads servicing those 4 java threads and there are even fewer processor threads normally 2 or 4 per core of your cpu. 如果要进一步挖掘访问该列表的4个Java线程,则可能有更少的OS线程为这4个Java线程提供服务,并且甚至更少的处理器线程通常每个cpu内核为2或4。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM