简体   繁体   English

用Java编写多线程映射迭代器

[英]Writing a multithreaded mapping iterator in Java

I've got a general purpose mapping iterator: something like this: 我有一个通用的映射迭代器:类似这样的东西:

class Mapper<F, T> implements Iterator<T> {

  private Iterator<F> input;
  private Action<F, T> action;

  public Mapper(input, action) {...}

  public boolean hasNext() {
    return input.hasNext();
  }

  public T next() {
    return action.process(input.next());
  }
}

Now, given that action.process() can be time-consuming, I want to gain performance by using multiple threads to process items from the input in parallel. 现在,假设action.process()可能非常耗时,我希望通过使用多个线程并行处理输入项来获得性能。 I want to allocate a pool of N worker threads and allocate items to these threads for processing. 我想分配一个N个工作线程池,并将项目分配给这些线程进行处理。 This should happen "behind the scenes" so the client code just sees an Iterator. 这应该在“幕后”发生,因此客户端代码只能看到Iterator。 The code should avoid holding either the input or the output sequence in memory. 代码应避免将输入或输出序列保存在内存中。

To add a twist, I want two versions of the solution, one which retains order (the final iterator delivers items in the same order as the input iterator) and one of which does not necessarily retain order (each output item is delivered as soon as it is available). 为了添加一个扭曲,我想要两个版本的解决方案,一个保留订单(最终迭代器以与输入迭代器相同的顺序交付项目),其中一个不一定保留订单(每个输出项目尽快交付)它是可用的)。

I've sort-of got this working but the code seems convoluted and unreliable and I'm not confident it's using best practice. 我有点工作,但代码似乎令人费解和不可靠,我不相信它正在使用最佳实践。

Any suggestions on the simplest and most robust way of implementing this? 有关最简单,最可靠的实施方法的建议吗? I'm looking for something that works in JDK 6, and I want to avoid introducing dependencies on external libraries/frameworks if possible. 我正在寻找适用于JDK 6的东西,我想尽可能避免在外部库/框架上引入依赖。

I'd use a thread pool for the threads and a BlockingQueue to feed out from the pool. 我会为线程使用线程池,并从池中输出BlockingQueue

This seems to work with my simple test cases. 这似乎适用于我的简单测试用例。

interface Action<F, T> {

    public T process(F f);

}

class Mapper<F, T> implements Iterator<T> {

    protected final Iterator<F> input;
    protected final Action<F, T> action;

    public Mapper(Iterator<F> input, Action<F, T> action) {
        this.input = input;
        this.action = action;
    }

    @Override
    public boolean hasNext() {
        return input.hasNext();
    }

    @Override
    public T next() {
        return action.process(input.next());
    }
}

class ParallelMapper<F, T> extends Mapper<F, T> {

    // The pool.
    final ExecutorService pool;
    // The queue.
    final BlockingQueue<T> queue;
    // The next one to deliver.
    private T next = null;

    public ParallelMapper(Iterator<F> input, Action<F, T> action, int threads, int queueLength) {
        super(input, action);
        // Start my pool.
        pool = Executors.newFixedThreadPool(threads);
        // And the queue.
        queue = new ArrayBlockingQueue<>(queueLength);
    }

    class Worker implements Runnable {

        final F f;
        private T t;

        public Worker(F f) {
            this.f = f;
        }

        @Override
        public void run() {
            try {
                queue.put(action.process(f));
            } catch (InterruptedException ex) {
                // Not sure what you can do here.
            }
        }

    }

    @Override
    public boolean hasNext() {
        // All done if delivered it and the input is empty and the queue is empty and the threads are finished.
        while (next == null && (input.hasNext() || !queue.isEmpty() || !pool.isTerminated())) {
            // First look in the queue.
            next = queue.poll();
            if (next == null) {
                // Queue empty.
                if (input.hasNext()) {
                    // Start a new worker.
                    pool.execute(new Worker(input.next()));
                }
            } else {
                // Input exhausted - shut down the pool - unless we already have.
                if (!pool.isShutdown()) {
                    pool.shutdown();
                }
            }
        }
        return next != null;
    }

    @Override
    public T next() {
        T n = next;
        if (n != null) {
            // Delivered that one.
            next = null;
        } else {
            // Fails.
            throw new NoSuchElementException();
        }
        return n;
    }
}

public void test() {
    List<Integer> data = Arrays.asList(5, 4, 3, 2, 1, 0);
    System.out.println("Data");
    for (Integer i : Iterables.in(data)) {
        System.out.println(i);
    }
    Action<Integer, Integer> action = new Action<Integer, Integer>() {

        @Override
        public Integer process(Integer f) {
            try {
                // Wait that many seconds.
                Thread.sleep(1000L * f);
            } catch (InterruptedException ex) {
                // Just give up.
            }
            // Return it unchanged.
            return f;
        }

    };
    System.out.println("Processed");
    for (Integer i : Iterables.in(new Mapper<Integer, Integer>(data.iterator(), action))) {
        System.out.println(i);
    }
    System.out.println("Parallel Processed");
    for (Integer i : Iterables.in(new ParallelMapper<Integer, Integer>(data.iterator(), action, 2, 2))) {
        System.out.println(i);
    }

}

Note: Iterables.in(Iterator<T>) just creates an Iterable<T> that encapsulates the passed Iterator<T> . 注意: Iterables.in(Iterator<T>)只创建一个封装传递的Iterator<T>Iterable<T> Iterator<T>

For your in-order one you could process Pair<Integer,F> and use a PriorityQueue for the thread output. 对于你的顺序,你可以处理Pair<Integer,F>并使用PriorityQueue作为线程输出。 You could then arrange to pull them in order. 然后你可以安排按顺序拉它们。

I dont think it can work with parallel threads because hasNext() may return true but by the time the thread calls next() there may be no more elements. 我不认为它可以使用并行线程因为hasNext()可能返回true但是当线程调用next()时可能没有更多的元素。 It is better to use only next() which will return null when theres no more elements 最好只使用next(),当没有更多的元素时,它将返回null

OK, thanks everyone. 好的,谢谢大家。 This is what I've done. 这就是我所做的。

First I wrap my ItemMappingFunction in a Callable: 首先,我将ItemMappingFunction包装在Callable中:

private static class CallableAction<F extends Item, T extends Item> 
implements Callable<T> {
    private ItemMappingFunction<F, T> action;
    private F input;
    public CallableAction(ItemMappingFunction<F, T> action, F input) {
            this.action = action;
            this.input = input;
    }
    public T call() throws XPathException {
            return action.mapItem(input);
    }
}

I described my problem in terms of the standard Iterator class, but actually I'm using my own SequenceIterator interface, which has a single next() method that returns null at end-of-sequence. 我用标准Iterator类描述了我的问题,但实际上我正在使用我自己的SequenceIterator接口,它有一个next()方法,它在序列结束时返回null。

I declare the class in terms of the "ordinary" mapping iterator like this: 我用这样的“普通”映射迭代器声明了这个类:

public class MultithreadedMapper<F extends Item, T extends Item> extends Mapper<F, T> {

    private ExecutorService service;
    private BlockingQueue<Future<T>> resultQueue = 
        new LinkedBlockingQueue<Future<T>>();

On initialization I create the service and prime the queue: 在初始化时,我创建服务并填充队列:

public MultithreadedMapper(SequenceIterator base, ItemMappingFunction<F, T> action) throws XPathException {
        super(base, action);

        int maxThreads = Runtime.getRuntime().availableProcessors();
        maxThreads = maxThreads > 0 ? maxThreads : 1;
        service = Executors.newFixedThreadPool(maxThreads);

        // prime the queue
        int n = 0;
        while (n++ < maxThreads) {
            F item = (F) base.next();
            if (item == null) {
                return;
            }
            mapOneItem(item);
        }
    }

Where mapOneItem is: mapOneItem的位置是:

private void mapOneItem(F in) throws XPathException {
    Future<T> future = service.submit(new CallableAction(action, in));
    resultQueue.add(future);
}

When the client asks for the next item, I first submit the next input item to the executor service, then get the next output item, waiting for it to be available if necessary: 当客户端要求下一个项目时,我首先将下一个输入项目提交给执行程序服务,然后获取下一个输出项目,等待它在必要时可用:

    public T next() throws XPathException {
        F nextIn = (F)base.next();
        if (nextIn != null) {
            mapOneItem(nextIn);
        }
        try {
            Future<T> future = resultQueue.poll();
            if (future == null) {
                service.shutdown();
                return null;
            } else {
                return future.get();
            }
        } catch (InterruptedException e) {
            throw new XPathException(e);
        } catch (ExecutionException e) {
            if (e.getCause() instanceof XPathException) {
                throw (XPathException)e.getCause();
            }
            throw new XPathException(e);
        }
    }

In order for action.process to be called in parallel, next() would need to be called in parallel. 为了并行调用action.process ,需要并行调用next() That's not good practice. 那不是好习惯。 Instead you could use a ExecutorCompletionService . 相反,您可以使用ExecutorCompletionService

See https://stackoverflow.com/a/1228445/360211 请参阅https://stackoverflow.com/a/1228445/360211

Unfortunately I believe this only gives you the option to preserve order. 不幸的是,我认为这只会让您选择保留订单。

I would recommend looking at the JDK executor framework. 我建议看一下JDK执行器框架。 Create tasks (Runnables) for you actions. 为您的操作创建任务(Runnables)。 Run them in parallel using a thread pool if needed or in sequence if not. 如果需要,可以使用线程池并行运行它们,否则按顺序运行它们。 Give the tasks sequence numbers if you need order in the end. 如果您最终需要订单,请提供任务序列号。 But as noted in other answers, the iterator does not work very well for you since calling next() is generally not done in parallel.. So do you even need an iterator or just to get the tasks processed? 但正如其他答案所述,迭代器对你来说效果不好,因为调用next()通常不是并行完成的。所以你甚至需要一个迭代器或者只是为了处理任务?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM