简体   繁体   English

Java +线程:并行处理行

[英]Java + Threads: processing lines in parallel

I want to process a large number of independant lines in parallel. 我想并行处理大量独立行。 In the following code I'm creating a pool of NUM_THREAD Theads containing POOL_SIZE lines. 在以下代码中,我将创建一个包含POOL_SIZE行的NUM_THREAD Theads池。 Each thread is started and I then wait for each thread using 'join'. 启动每个线程,然后使用“ join”等待每个线程。

I guess it is a bad practice as here, a finished Thread will have to wait for his siblings in the pool. 我想这是一个坏习惯,因为在这里,完成的线程将不得不等待池中的兄弟姐妹。

What would be the correct way to implement this code ? 实施此代码的正确方法是什么? Which classes should I use ? 我应该使用哪些课程?

Thanks ! 谢谢 !

class FasterBin extends Thread
    {
    private List<String> dataRows=new ArrayList<String>();
    private Object result=null;
    @Override
    public void run()
        {
        for(String s:dataRows)
            {
            //Process item here (....)
            }
        }
    }


(...)

List<FasterBin> threads=new Vector<FasterBin>();
String line;
Iterator<String> iter=(...);
for(;;)
    {
    while(threads.size()< NUM_THREAD)
        {
        FasterBin bin=new FasterBin();
        while(
            bin.dataRows.size() < POOL_SIZE &&
            iter.hasNext()
            )
            {
            nRow++;
            bin.dataRows.add(iter.next());
            }
        if(bin.dataRows.isEmpty()) break;
        threads.add(bin);
        }
    if(threads.isEmpty()) break;


    for(FasterBin t:threads)
        {
        t.start();
        }
    for(FasterBin t:threads)
        {
        t.join();
        }
    for(FasterBin t:threads)
        {
        save(t.result);// ## do something with the result (save into a db etc...)
        }

    threads.clear();
    }

finally
    {
    while(!threads.isEmpty())
        {

        FasterBin b=threads.remove(threads.size()-1);
        try     {
            b.interrupt();
            }
        catch (Exception e)
            {
            }
        }
    }

Do NOT do all this by yourself! 不这样做,全部由自己! It is extremely hard to get 1) robust and 2) right. 要获得1)健壮性和2)正确性是极其困难的。

Instead rewrite your stuff to create a lot of Runnables or Callables and use a suitable ExecutorService to get an Executor to process them with the behaviour you want. 而是重写您的内容以创建许多Runnable或Callables,并使用适当的ExecutorService来使Executor以所需的行为对其进行处理。

Note that this stay inside the current JVM. 请注意,这留在当前的JVM中。 If you have more than one JVM available (on multiple machines) I would recommend opening a new question. 如果(在多台机器上)有多个JVM,我建议您提出一个新问题。

java.util.concurrent.ThreadPoolExecutor. java.util.concurrent.ThreadPoolExecutor。

        ThreadPoolExecutor  x=new ScheduledThreadPoolExecutor(10);
        x.execute(runnable);

See this for an overview: Java API for util.concurrent 有关概述,请参见此util.concurrent的Java API

Direct use of Threads is actually discouraged - look at the package java.util.concurrent, you'll find there ThreadPools and Futures which should be used instead. 实际上不建议直接使用Threads-查看包java.util.concurrent,您会发现那里应该使用ThreadPools和Futures。

Thread.join doesn't mean that the Thread waits for others, it means your main Thread waits for one of the Thread in list to die. Thread.join并不意味着该线程正在等待其他线程,这意味着您的主线程正在等待列表中的线程之一死亡。 In this case your main Thread waits for the slowiest working Thread to finish. 在这种情况下,您的主线程将等待最慢的工作线程完成。 I don't see a problem with this approach. 我认为这种方法没有问题。

Yes, in some sense, a finished Thread would have to wait for his siblings in the pool: when a thread finishes, it stops, and does not help other threads to finish sooner. 是的,从某种意义上讲,完成的Thread将不得不等待池中的同级对象:当一个线程完成时,它将停止,并且无助于其他线程更快地完成。 Better say, the whole work waits for the thread which works for the longest time. 最好说,整个工作等待最长的线程。

This is because each thread has exactly one task. 这是因为每个线程只有一个任务。 You better create many tasks, much more than the number of threads, and put them all in a single queue. 您最好创建许多任务,多于线程数,然后将所有任务放在一个队列中。 Let all working threads take their tasks from that queue in a loop. 让所有工作线程循环地从该队列中执行任务。 Then the difference in time for all threads would be roughly the time to execute one task, which is small because tasks are small. 那么,所有线程的时间差大约是执行一项任务的时间,这是很小的,因为任务很小。

You can start the pool of working threads yourself, or you can wrap each task in a Runnable and submit them to a standard thread pool - this makes no difference. 您可以自己启动工作线程池,也可以将每个任务包装在Runnable然后将它们提交到标准线程池中-这没有什么区别。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM