简体   繁体   English

并行顺序处理任务 Java

[英]Processing tasks in parallel and sequentially Java

In my program, the user can trigger different tasks via an interface, which take some time to process.在我的程序中,用户可以通过一个界面触发不同的任务,这需要一些时间来处理。 Therefore they are executed by threads.因此它们由线程执行。 So far I have implemented it so that I have an executer with one thread that executes all tasks one after the other.到目前为止,我已经实现了它,因此我有一个带有一个线程的执行器,它一个接一个地执行所有任务。 But now I would like to parallelize everything a little bit.但现在我想把所有东西都并行化一点。

ie I would like to run tasks in parallel, except if they have the same path, then I want to run them sequentially.即我想并行运行任务,除非它们具有相同的路径,然后我想按顺序运行它们。 For example, I have 10 threads in my pool and when a task comes in, the task should be assigned to the worker which is currently processing a task with the same path.例如,我的池中有 10 个线程,当一个任务进来时,该任务应该分配给当前正在处理具有相同路径的任务的工作人员。 If no task with the same path is currently being processed by a worker, then the task should be processed by a currently free worker.如果worker当前没有处理具有相同路径的任务,则该任务应该由当前空闲的worker处理。

Additional info: A task is any type of task that is executed on a file in the local file system.附加信息:任务是在本地文件系统中的文件上执行的任何类型的任务。 For example, renaming a file.例如,重命名文件。 Therefore, the task have the attribute path .因此,任务具有属性path And I don't want to execute two tasks on the same file at the same time, so such tasks with the same paths should be performed sequentially.而且我不想同时对同一个文件执行两个任务,所以这样的路径相同的任务应该顺序执行。

Here is my sample code but there is work to do:这是我的示例代码,但还有工作要做:

One of my problems is, I need a safe way to check if a worker is currently running and get the path of the currently running worker.我的一个问题是,我需要一种安全的方法来检查工作人员当前是否正在运行并获取当前正在运行的工作人员的路径。 By safe I mean, that no problems of simultaneous access or other thread problems occur.安全的意思是,不会发生同时访问的问题或其他线程问题。

    public class TasksOrderingExecutor {
    
        public interface Task extends Runnable {
            //Task code here
            String getPath();
        }
    
        private static class Worker implements Runnable {
    
            private final LinkedBlockingQueue<Task> tasks = new LinkedBlockingQueue<>();

            //some variable or mechanic to give the actual path of the running tasks??
    
            private volatile boolean stopped;
    
            void schedule(Task task) {
                tasks.add(task);
            }
    
            void stop() {
                stopped = true;
            }
    
            @Override
            public void run() {
                while (!stopped) {
                    try {
                        Task task = tasks.take();
                        task.run();
                    } catch (InterruptedException ie) {
                        // perhaps, handle somehow
                    }
                }
            }
        }
    
        private final Worker[] workers;
        private final ExecutorService executorService;
    
        /**
         * @param queuesNr nr of concurrent task queues
         */
        public TasksOrderingExecutor(int queuesNr) {
            Preconditions.checkArgument(queuesNr >= 1, "queuesNr >= 1");
            executorService = new ThreadPoolExecutor(queuesNr, queuesNr, 0, TimeUnit.SECONDS, new SynchronousQueue<>());
            workers = new Worker[queuesNr];
            for (int i = 0; i < queuesNr; i++) {
                Worker worker = new Worker();
                executorService.submit(worker);
                workers[i] = worker;
            }
        }
    
        public void submit(Task task) {
            Worker worker = getWorker(task);
            worker.schedule(task);
        }
    
        public void stop() {
            for (Worker w : workers) w.stop();
            executorService.shutdown();
        }
    
        private Worker getWorker(Task task) {
            //check here if a running worker with a specific path exists? If yes return it, else return a free worker. How do I check if a worker is currently running?
            return workers[task.getPath() //HERE I NEED HELP//];
        }
    }

Seems like you have a pair of problems:好像你有两个问题:

  • You want to check the status of tasks submitted to an executor service您想检查提交给执行器服务的任务的状态
  • You want to run tasks in parallel, and possibly prioritize them您希望并行运行任务,并可能对它们进行优先级排序

Future

For the first problem, capture the Future object returned when you submit a task to an executor service.对于第一个问题,捕获当您将任务提交给执行器服务时返回的Future object。 You can check the Future object for its completion status.您可以查看Future object 的完成状态。

Future< Task > future = myExecutorService.submit( someTask ) ;
…
boolean isCancelled = future.isCancelled() ;  // Returns true if this task was cancelled before it completed normally.
boolean isDone = future.isDone();  // Returns true if this task completed.

The Future is of a type, and that type can be your Task class itself. Future属于一种类型,该类型可以是您的Task class 本身。 Calling Future::get yields the Task object.调用Future::get产生Task object。 You can then interrogate that Task object for its contained file path.然后,您可以查询该Task object 以获取其包含的文件路径。

Task task = future.get() ;
String path = task.getPath() ;  // Access field via getter from your `Task` object.

Executors

Rather than instantiating new ThreadPoolExecutor , use the Executors utility class to instantiate an executor service on your behalf.与其实例化new ThreadPoolExecutor ,不如使用Executors实用程序 class 代表您实例化执行器服务。 Instantiating ThreadPoolExecutor directly is not needed for most common scenarios, as mentioned in the first line of its Javadoc.大多数常见场景不需要直接实例化ThreadPoolExecutor ,如其 Javadoc 的第一行所述。

ExecutorService es = Executors.newFixedThreadPool​( 3 ) ;  // Instantiate an executor service backed by a pool of three threads.

For the second problem, use an executor service backed by a thread pool rather than a single thread.对于第二个问题,使用由线程池而不是单个线程支持的执行器服务。 The executor service automatically assigns the submitted task to an available thread.执行器服务自动将提交的任务分配给可用线程。

As for grouping or prioritizing, use multiple executor services.至于分组或优先级,使用多个执行器服务。 You can instantiate more than one.您可以实例化多个。 You can have as many executor services as you want, provided you do not overload the demand on your deployment machine for CPU cores and memory (think about your maximum simultaneous usage).您可以拥有任意数量的执行器服务,前提是您的部署机器对 CPU 内核和 memory 的需求不会超载(考虑您的最大同时使用量)。

ExecutorService esSingleThread = Executors.newSingleThreadExecutor() ;
ExecutorService esMultiThread = Executors.newCachedThreadPool() ;

One executor service might be backed by a single thread to limit the demands on the deployment computer, while others might be backed by a thread pool to get more work done.一个执行器服务可能由单个线程支持以限制对部署计算机的需求,而其他执行器服务可能由线程池支持以完成更多工作。 You can use these multiple executor services as your multiple queues.您可以将这些多个执行器服务用作您的多个队列。 No need for you to be managing queues and workers as seen in the code of your Question.如您的问题代码所示,您无需管理队列和工作人员。 Executors were invented to further simplify working with multiple threads.发明执行器是为了进一步简化多线程的工作。

Concurrency并发

You said:你说:

And I don't want to execute two tasks on the same file at the same time, so such tasks with the same paths should be performed sequentially.而且我不想同时对同一个文件执行两个任务,所以这样的路径相同的任务应该顺序执行。

You should have a better way to handle the concurrency conflict that just scheduling tasks on threads.您应该有一种更好的方法来处理仅在线程上调度任务的并发冲突。

Java has ways to manage concurrent access to files. Java 有办法管理对文件的并发访问。 Search to learn more, as this has been covered on Stack Overflow already.搜索以了解更多信息,因为 Stack Overflow 已经对此进行了介绍。


Perhaps I have not understood fully your needs, so do comment if I am off-base.也许我还没有完全理解您的需求,所以如果我不在基地,请发表评论。

It seems that you need some sort of "Task Dispatcher" that executes or holds some tasks depending on some identifier (here the Path of the file the task is applied to).似乎您需要某种“任务调度程序”来执行或保存某些任务,具体取决于某些标识符(这里是任务应用到的文件的路径)。

You could use something like this:你可以使用这样的东西:

public class Dispatcher<I> implements Runnable {

/**
 * The executor used to execute the submitted task
 */
private final Executor executor;

/**
 * Map of the pending tasks
 */
private final Map<I, Deque<Runnable>> pendingTasksById = new HashMap<>();

/**
 * set containing the id that are currently executed
 */
private final Set<I> runningIds = new HashSet<>();

/**
 * Action to be executed by the dispatcher
 */
private final BlockingDeque<Runnable> actionQueue = new LinkedBlockingDeque<>();

public Dispatcher(Executor executor) {
    this.executor = executor;
}

/**
 * Task in the same group will be executed sequentially (but not necessarily in the same thread)
 * @param id the id of the group the task belong
 * @param task the task to execute
 */
public void submitTask(I id, Runnable task) {
    actionQueue.addLast(() -> {
        if (canBeLaunchedDirectly(id)) {
            executeTask(id, task);
        } else {
            addTaskToPendingTasks(id, task);
            ifPossibleLaunchPendingTaskForId(id);
        }
    });
}


@Override
public void run() {
    while (!Thread.currentThread().isInterrupted()) {
        try {
            actionQueue.takeFirst().run();
        } catch (InterruptedException e) {
            Thread.currentThread().isInterrupted();
            break;
        }
    }
}


private void addTaskToPendingTasks(I id, Runnable task) {
    this.pendingTasksById.computeIfAbsent(id, i -> new LinkedList<>()).add(task);
}


/**
 * @param id an id of a group
 * @return true if a task of the group with the provided id is currently executed
 */
private boolean isRunning(I id) {
    return runningIds.contains(id);
}

/**
 * @param id an id of a group
 * @return an optional containing the first pending task of the group,
 * an empty optional if no such task is available
 */
private Optional<Runnable> getFirstPendingTask(I id) {
    final Deque<Runnable> pendingTasks = pendingTasksById.get(id);
    if (pendingTasks == null) {
        return Optional.empty();
    }
    assert !pendingTasks.isEmpty();
    final Runnable result = pendingTasks.removeFirst();
    if (pendingTasks.isEmpty()) {
        pendingTasksById.remove(id);
    }
    return Optional.of(result);
}

private boolean canBeLaunchedDirectly(I id) {
    return !isRunning(id) && pendingTasksById.get(id) == null;
}

private void executeTask(I id, Runnable task) {
    this.runningIds.add(id);
    executor.execute(() -> {
        try {
            task.run();
        } finally {
            actionQueue.addLast(() -> {
                runningIds.remove(id);
                ifPossibleLaunchPendingTaskForId(id);
            });
        }
    });
}

private void ifPossibleLaunchPendingTaskForId(I id) {
    if (isRunning(id)) {
        return;
    }
    getFirstPendingTask(id).ifPresent(r -> executeTask(id, r));
}

} }

To use it, you need to launch it in a separated thread (or you can adapt it for a cleaner solution) like this:要使用它,您需要在一个单独的线程中启动它(或者您可以调整它以获得更清洁的解决方案),如下所示:

    final Dispatcher<Path> dispatcher = new Dispatcher<>(Executors.newCachedThreadPool());
    new Thread(dispatcher).start();
    dispatcher.submitTask(path, task1);
    dispatcher.submitTask(path, task2);

This is basic example, you might need to keep the thread and even better wrap all of that in a class.这是基本示例,您可能需要保留线程,甚至更好地将所有线程包装在 class 中。

all you need is a hash map of actors, with file path as a key.您只需要一个 hash map 演员,文件路径作为键。 Different actors would run in parallel, and concrete actor would handle tasks sequentially.不同的actor将并行运行,具体的actor将按顺序处理任务。 Your solution is wrong because Worker class uses blocking operation take but is executed in a limited thread pool, which may lead to a thread starvation (a kind of deadlock).您的解决方案是错误的,因为 Worker class 使用阻塞操作take但在有限的线程池中执行,这可能导致线程饥饿(一种死锁)。 Actors do not block when waiting for next message.等待下一条消息时,Actor 不会阻塞。

import org.df4j.core.dataflow.ClassicActor;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.*;

public class TasksOrderingExecutor {

public static class Task implements Runnable {
    private final String path;
    private final String task;

    public Task(String path, String task) {
        this.path = path;
        this.task = task;
    }

    //Task code here
    String getPath() {
        return path;
    }

    @Override
    public void run() {
        System.out.println(path+"/"+task+" started");
        try {
            Thread.sleep(500);
        } catch (InterruptedException e) {
        }
        System.out.println(path+"/"+task+" stopped");
    }
}

static class Worker extends ClassicActor<Task> {

    @Override
    protected void runAction(Task task) throws Throwable {
        task.run();
    }
}

private final ExecutorService executorService;

private final Map<String,Worker> workers = new HashMap<String,Worker>(){
    @Override
    public Worker get(Object key) {
        return super.computeIfAbsent((String) key, (k) -> {
            Worker res = new Worker();
            res.setExecutor(executorService);
            res.start();
            return res;
        });
    }
};

/**
 * @param queuesNr nr of concurrent task queues
 */
public TasksOrderingExecutor(int queuesNr) {
    executorService = ForkJoinPool.commonPool();
}

public void submit(Task task) {
    Worker worker = getWorker(task);
    worker.onNext(task);
}

public void stop() throws InterruptedException {
    for (Worker w : workers.values()) {
        w.onComplete();
    }
    executorService.shutdown();
    executorService.awaitTermination(10, TimeUnit.SECONDS);
}

private Worker getWorker(Task task) {
    //check here if a runnig worker with a specific path exists? If yes return it, else return a free worker. How do I check if a worker is currently running?
    return workers.get(task.getPath());
}

public static void main(String[] args) throws InterruptedException {
    TasksOrderingExecutor orderingExecutor = new TasksOrderingExecutor(20);
    orderingExecutor.submit(new Task("path1", "task1"));
    orderingExecutor.submit(new Task("path1", "task2"));
    orderingExecutor.submit(new Task("path2", "task1"));
    orderingExecutor.submit(new Task("path3", "task1"));
    orderingExecutor.submit(new Task("path2", "task2"));
    orderingExecutor.stop();
}
}

The protocol of execution shows that tasks with te same key are executed sequentially and tasks with different keys are executed in parallel:执行协议表明具有相同键的任务是顺序执行的,具有不同键的任务是并行执行的:

path3/task1 started
path2/task1 started
path1/task1 started
path3/task1 stopped
path2/task1 stopped
path1/task1 stopped
path2/task2 started
path1/task2 started
path2/task2 stopped
path1/task2 stopped

I used my own actor library DF4J , but any other actor library can be used.我使用了自己的演员库DF4J ,但可以使用任何其他演员库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM