简体   繁体   English

在 java 中并行化任务的最简单方法是什么?

[英]What is the easiest way to parallelize a task in java?

Say I have a task like:假设我有这样的任务:

for(Object object: objects) {
    Result result = compute(object);
    list.add(result);
}

What is the easiest way to parallelize each compute() (assuming they are already parallelizable)?并行化每个 compute() 的最简单方法是什么(假设它们已经可以并行化)?

I do not need an answer that matches strictly the code above, just a general answer.我不需要与上面的代码严格匹配的答案,只是一个一般性的答案。 But if you need more info: my tasks are IO bound and this is for a Spring Web application and the tasks are going to be executed in a HTTP request.但如果您需要更多信息:我的任务是 IO 绑定的,这是针对 Spring Web 应用程序的,任务将在 HTTP 请求中执行。

I would recommend taking a look at ExecutorService .我建议看看ExecutorService

In particular, something like this:特别是,像这样的事情:

ExecutorService EXEC = Executors.newCachedThreadPool();
List<Callable<Result>> tasks = new ArrayList<Callable<Result>>();
for (final Object object: objects) {
    Callable<Result> c = new Callable<Result>() {
        @Override
        public Result call() throws Exception {
            return compute(object);
        }
    };
    tasks.add(c);
}
List<Future<Result>> results = EXEC.invokeAll(tasks);

Note that using newCachedThreadPool could be bad if objects is a big list.请注意,如果objects是一个大列表,则使用newCachedThreadPool可能会很糟糕。 A cached thread pool could create a thread per task!缓存线程池可以为每个任务创建一个线程! You may want to use newFixedThreadPool(n) where n is something reasonable (like the number of cores you have, assuming compute() is CPU bound).您可能想要使用newFixedThreadPool(n) ,其中 n 是合理的(例如您拥有的内核数,假设compute()受 CPU 限制)。

Here's full code that actually runs:这是实际运行的完整代码:

import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;

public class ExecutorServiceExample {
    private static final Random PRNG = new Random();

    private static class Result {
        private final int wait;
        public Result(int code) {
            this.wait = code;
        }
    }

    public static Result compute(Object obj) throws InterruptedException {
        int wait = PRNG.nextInt(3000);
        Thread.sleep(wait);
        return new Result(wait);
    }

    public static void main(String[] args) throws InterruptedException,
        ExecutionException {
        List<Object> objects = new ArrayList<Object>();
        for (int i = 0; i < 100; i++) {
            objects.add(new Object());
        }

        List<Callable<Result>> tasks = new ArrayList<Callable<Result>>();
        for (final Object object : objects) {
            Callable<Result> c = new Callable<Result>() {
                @Override
                public Result call() throws Exception {
                    return compute(object);
                }
            };
            tasks.add(c);
        }

        ExecutorService exec = Executors.newCachedThreadPool();
        // some other exectuors you could try to see the different behaviours
        // ExecutorService exec = Executors.newFixedThreadPool(3);
        // ExecutorService exec = Executors.newSingleThreadExecutor();
        try {
            long start = System.currentTimeMillis();
            List<Future<Result>> results = exec.invokeAll(tasks);
            int sum = 0;
            for (Future<Result> fr : results) {
                sum += fr.get().wait;
                System.out.println(String.format("Task waited %d ms",
                    fr.get().wait));
            }
            long elapsed = System.currentTimeMillis() - start;
            System.out.println(String.format("Elapsed time: %d ms", elapsed));
            System.out.println(String.format("... but compute tasks waited for total of %d ms; speed-up of %.2fx", sum, sum / (elapsed * 1d)));
        } finally {
            exec.shutdown();
        }
    }
}

With Java8 and later you can use a parallelStream on the collection to achieve this:使用 Java8 及更高版本,您可以在集合上使用parallelStream来实现此目的:

List<T> objects = ...;

List<Result> result = objects.parallelStream().map(object -> {
            return compute(object);
        }).collect(Collectors.toList());

Note: the order of the result list may not match the order in the objects list.注意:结果列表的顺序可能与对象列表中的顺序不一致。

Details how to setup the right number of threads are available in this stackoverflow question how-many-threads-are-spawned-in-parallelstream-in-java-8在这个 stackoverflow question how-many-threads-are-spawned-in-parallelstream-in-java-8中提供了如何设置正确数量的线程的详细信息

One can simple create a few thread and get the result.可以简单地创建几个线程并获得结果。

Thread t = new Mythread(object);

if (t.done()) {
   // get result
   // add result
}

EDIT : I think other solutions are cooler.编辑:我认为其他解决方案更酷。

Here's something I use in my own projects:这是我在自己的项目中使用的东西:

public class ParallelTasks
{
    private final Collection<Runnable> tasks = new ArrayList<Runnable>();

    public ParallelTasks()
    {
    }

    public void add(final Runnable task)
    {
        tasks.add(task);
    }

    public void go() throws InterruptedException
    {
        final ExecutorService threads = Executors.newFixedThreadPool(Runtime.getRuntime()
                .availableProcessors());
        try
        {
            final CountDownLatch latch = new CountDownLatch(tasks.size());
            for (final Runnable task : tasks)
                threads.execute(new Runnable() {
                    public void run()
                    {
                        try
                        {
                            task.run();
                        }
                        finally
                        {
                            latch.countDown();
                        }
                    }
                });
            latch.await();
        }
        finally
        {
            threads.shutdown();
        }
    }
}

// ...

public static void main(final String[] args) throws Exception
{
    ParallelTasks tasks = new ParallelTasks();
    final Runnable waitOneSecond = new Runnable() {
        public void run()
        {
            try
            {
                Thread.sleep(1000);
            }
            catch (InterruptedException e)
            {
            }
        }
    };
    tasks.add(waitOneSecond);
    tasks.add(waitOneSecond);
    tasks.add(waitOneSecond);
    tasks.add(waitOneSecond);
    final long start = System.currentTimeMillis();
    tasks.go();
    System.err.println(System.currentTimeMillis() - start);
}

Which prints a bit over 2000 on my dual-core box.在我的双核盒子上打印了 2000 多点。

You can use the ThreadPoolExecutor .您可以使用ThreadPoolExecutor Here is sample code: http://programmingexamples.wikidot.com/threadpoolexecutor (too long to bring it here)这是示例代码: http : //programmingexamples.wikidot.com/threadpoolexecutor (太长了,无法将其带到这里)

I to was going to mention an executor class.我要提到一个执行程序类。 Here is some example code that you would place in the executor class.下面是一些示例代码,您将放置在 executor 类中。

    private static ExecutorService threadLauncher = Executors.newFixedThreadPool(4);

    private List<Callable<Object>> callableList = new ArrayList<Callable<Object>>();

    public void addCallable(Callable<Object> callable) {
        this.callableList.add(callable);
    }

    public void clearCallables(){
        this.callableList.clear();
    }

    public void executeThreads(){
        try {
        threadLauncher.invokeAll(this.callableList);
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }

    public Object[] getResult() {

        List<Future<Object>> resultList = null;
        Object[] resultArray = null;
        try {

            resultList = threadLauncher.invokeAll(this.callableList);

            resultArray = new Object[resultList.size()];

            for (int i = 0; i < resultList.size(); i++) {
                resultArray[i] = resultList.get(i).get();
            }

        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

        return resultArray;
    }

Then to use it you would make calls to the executor class to populate and execute it.然后要使用它,您将调用 executor 类来填充和执行它。

executor.addCallable( some implementation of callable) // do this once for each task 
Object[] results = executor.getResult();

Fork/Join的并行数组是一种选择

有关更详细的答案,请阅读Java Concurrency in Practice并使用java.util.concurrent

A neat way is to utilize ExecutorCompletionService.一个巧妙的方法是利用 ExecutorCompletionService。

Say you have following code (as in your example):假设您有以下代码(如您的示例所示):

 public static void main(String[] args) {
    List<Character> letters = IntStream.range(65, 91).mapToObj(i -> (char) i).collect(Collectors.toList());
    List<List<Character>> list = new ArrayList<>();

    for (char letter : letters) {
      List<Character> result = computeLettersBefore(letter);
      list.add(result);
    }

    System.out.println(list);
  }

  private static List<Character> computeLettersBefore(char letter) {
    return IntStream.range(65, 1 + letter).mapToObj(i -> (char) i).collect(Collectors.toList());
  }

Now to execute the tasks in parallel all you need to do is to create ExecutorCompletionService backed by thread pool.现在要并行执行任务,您需要做的就是创建由线程池支持的 ExecutorCompletionService。 Then submit tasks and read the results.然后提交任务并读取结果。 Since ExecutorCompletionService uses LinkedBlockingQueue under the hood, the results become available for pickup as soon as they are available (if you run the code you will notice that the order of results is random):由于 ExecutorCompletionService 在幕后使用 LinkedBlockingQueue,结果一可用就可以获取(如果您运行代码,您会注意到结果的顺序是随机的):

public static void main(String[] args) throws InterruptedException, ExecutionException {
    final ExecutorService threadPool = Executors.newFixedThreadPool(3);
    final ExecutorCompletionService<List<Character>> completionService = new ExecutorCompletionService<>(threadPool);

    final List<Character> letters = IntStream.range(65, 91).mapToObj(i -> (char) i).collect(Collectors.toList());
    List<List<Character>> list = new ArrayList<>();

    for (char letter : letters) {
      completionService.submit(() -> computeLettersBefore(letter));
    }

    // NOTE: instead over iterating over letters again number of submitted tasks can be used as a base for loop
    for (char letter : letters) {
      final List<Character> result = completionService.take().get();
      list.add(result);
    }

    threadPool.shutdownNow(); // NOTE: for safety place it inside finally block 

    System.out.println(list);
  }

  private static List<Character> computeLettersBefore(char letter) {
    return IntStream.range(65, 1 + letter).mapToObj(i -> (char) i).collect(Collectors.toList());
  }

I know it's an old old thread, but since Rxjava (now it's v3) came out, my favorite way to do parallel programming is through its flatMap by the following several lines.我知道这是一个老旧的线程,但是自从Rxjava (现在是 v3)问世以来,我最喜欢的并行编程方式是通过它的flatMap通过以下几行。 (sometimes but not very intuitive at the first sight) (有时但乍一看不是很直观)

// Assume we're in main thread at the moment
Flowable.create(...) // upstream data provider, main thread
  .map(...) // some transformers?? main thread
  .filter(...) // some filter thread
  .flatMap(data -> Flowable.just(data)
               .subscribeOn(Schedulers.from(...your executorservice for the sub worker.....), true) // true is to delay the error. 
               .doOnNext(this::process)
           , MAX_CONCURRENT) // max number of concurrent workers
  .subscribe();

You can check it's javadoc to understand the operators.您可以查看它的 javadoc 以了解运算符。 Rxjava 3- Flowable A simple example: Rxjava 3- Flowable一个简单的例子:

Flowable.range(1, 100)
                .map(Object::toString)
                .flatMap (i -> Flowable.just(i)
                        .doOnNext(j -> {
                            System.out.println("Current thread is ");
                            Thread.sleep(100);
                        }).subscribeOn(Schedulers.io()), true, 10)
        
                .subscribe(
                        integer -> log.info("We consumed {}", integer),
                        throwable -> log.error("We met errors", throwable),
                        () -> log.info("The stream completed!!!"));

And for your case:对于你的情况:

for(Object object: objects) {
    Result result = compute(object);
    list.add(result);
}

We could try:我们可以尝试:

Flowable.fromIterable(objects)
        .flatMap(obj -> 
                    Flowable.just(compute(obj)).subscribeOn(Schedulers.io()), true, YOUR_CONCURRENCY_NUMBER))
        .doOnNext(res -> list.add(res))
        .subscribe()

Bonus points: if you need to add some ordering, let's say for example, odd number all go to worker1, even number worker2, etc. Rxjava can achieve that easily by groupBy and flatMap operators together.奖励点:如果你需要添加一些排序,例如,奇数 go 到 worker1,偶数 worker2 等。Rxjava 可以通过groupByflatMap运算符一起轻松实现。 I won't go too details about them here. go 我不会在这里详细介绍它们。 Enjoy playing:)))喜欢玩:)))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM