发送到ExecutorService的作业的运行时间

Question

Good day, 美好的一天，

I am writing a program where a method is called for each line read from a text file. 我正在编写一个程序，其中对从文本文件读取的每一行调用一个方法。 As each call of this method is independent of any other line read I can call them on parallel. 由于此方法的每次调用均独立于其他任何行读取，因此我可以并行调用它们。 To maximize cpu usage I use a ExecutorService where I submit each run() call. 为了最大限度地利用cpu，我使用ExecutorService来提交每个run（）调用。 As the text file has 15 million lines, I need to stagger the ExecutorService run to not create too many jobs at once (OutOfMemory exception). 由于文本文件有1500万行，因此我需要错开ExecutorService运行以一次不创建太多作业（OutOfMemory异常）。 I also want to keep track of the time each submitted run has been running as I have seen that some are not finishing. 我还想跟踪每次提交的运行的运行时间，因为我发现有些运行尚未完成。 The problem is that when I try to use the Future.get method with timeout, the timeout refers to the time since it got into the queue of the ExecutorService, not since it started running, if it even started. 问题是，当我尝试将Future.get方法与超时一起使用时，超时是指它进入ExecutorService队列的时间，而不是指它甚至从开始运行就开始运行的时间。 I would like to get the time since it started running, not since it got into the queue. 我想花一些时间，因为它开始运行，而不是因为它进入了队列。

The code looks like this: 代码如下：

ExecutorService executorService= Executors.newFixedThreadPool(ncpu);
line = reader.readLine();
long start = System.currentTimeMillis();
HashMap<MyFut,String> runs = new HashMap<MyFut, String>();
HashMap<Future, MyFut> tasks = new HashMap<Future, MyFut>();
while ( (line = reader.readLine()) != null ) { 

String s = line.split("\t")[1];
final String m = line.split("\t")[0];
MyFut f = new MyFut(s, m);
tasks.put(executorService.submit(f), f);

runs.put(f, line);

while (tasks.size()>ncpu*100){
    try {
        Thread.sleep(100);
    } catch (InterruptedException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    Iterator<Future> i = tasks.keySet().iterator();
    while(i.hasNext()){
        Future task = i.next();
        if (task.isDone()){
            i.remove();

        } else {
            MyFut fut = tasks.get(task);
            if (fut.elapsed()>10000){
                System.out.println(line);
                task.cancel(true);
                i.remove();
            }
        }
    }
}
}

private static class MyFut implements Runnable{

private long start;
String copy;
String id2;

public MyFut(String m, String id){
    super();

    copy=m;
    id2 = id;
}

public long elapsed(){
    return System.currentTimeMillis()-start;
}



@Override
public void run() {
    start = System.currentTimeMillis();
    do something...
}

}

As you can see I try to keep track of how many jobs I have sent and if a threshold is passed I wait a bit until some have finished. 如您所见，我尝试跟踪已发送的作业数，如果超过了阈值，我会稍等片刻，直到一些作业完成。 I also try to check if any of the jobs is taking too long to cancel it, keeping in mind which failed, and continue execution. 我还要尝试检查是否有任何作业花费太长时间才能取消它，请牢记哪个失败，然后继续执行。 This is not working as I hoped. 这不是我希望的那样。 10 seconds execution for one task is much more than needed (I get 1000 lines done in 70 to 130s depending on machine and number of cpu). 一项任务执行10秒的时间远远超出了需要的时间（根据机器和CPU的数量，我会在70到130秒内完成1000行代码）。

What am I doing wrong? 我究竟做错了什么？ Shouldn't the run method in my Runnable class be called only when some Thread in the ExecutorService is free and starts working on it? 我的Runnable类中的run方法是否不应该仅在ExecutorService中的某些线程空闲并开始对其工作时才调用？ I get a lot of results that take more than 10 seconds. 我得到许多结果，这些结果花费了超过10秒的时间。 Is there a better way to achieve what I am trying? 有没有更好的方法来实现我的目标？

Thanks. 谢谢。

Answer 1

If you are using Future, I would recommend change Runnable to Callable and return total time in execution of thread as result. 如果使用的是Future，我建议将Runnable更改为Callable并返回执行线程的总时间作为结果。 Below is sample code: 下面是示例代码：

import java.util.concurrent.Callable;

public class MyFut implements Callable<Long> {

    String copy;
    String id2;

    public MyFut(String m, String id) {
        super();

        copy = m;
        id2 = id;
    }

    @Override
    public Long call() throws Exception {
        long start = System.currentTimeMillis();
        //do something...
        long end = System.currentTimeMillis();
        return (end - start);
    }
}

Answer 2

You are making your work harder as it should be. 您正在使工作更加努力。 Java's framework provides everything you want, you only have to use it. Java的框架提供了您想要的一切，您只需要使用它即可。

Limiting the number of pending work items works by using a bounded queue , but the ExecutorService returned by Executors.newFixedThreadPool() uses an unbound queue. 限制待审批工作项的数目使用界队列的作品，但ExecutorService由归国Executors.newFixedThreadPool()使用未绑定的队列。 The policy to wait once the bounded queue is full can be implemented via a RejectedExecutionHandler . 一旦有界队列已满，要等待的策略可以通过RejectedExecutionHandler来实现。 The entire thing looks like this: 整个过程看起来像这样：

static class WaitingRejectionHandler implements RejectedExecutionHandler {
  public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
    try {
      executor.getQueue().put(r);// block until capacity available
    } catch(InterruptedException ex) {
      throw new RejectedExecutionException(ex);
    }
  }
}
public static void main(String[] args)
{
  final int nCPU=Runtime.getRuntime().availableProcessors();
  final int maxPendingJobs=100;
  ExecutorService executorService=new ThreadPoolExecutor(nCPU, nCPU, 1, TimeUnit.MINUTES,
    new ArrayBlockingQueue<Runnable>(maxPendingJobs), new WaitingRejectionHandler());

  // start flooding the `executorService` with jobs here

That's all. 就这样。

Measuring the elapsed time within a job is quite easy as it has nothing to do with multi-threading: 测量作业中所经过的时间是很容易的，因为它没有任何关系与多线程：

long startTime=System.nanoTime();
// do your work here
long elpasedTimeSoFar = System.nanoTime()-startTime;

But maybe you don't need it anymore once you are using the bounded queue. 但是，一旦使用有限队列，也许您就不再需要它了。

By the way the Future.get method with timeout does not refer to the time since it got into the queue of the ExecutorService, it refers to the time of invoking the get method itself. 顺便说Future.get带有超时的Future.get方法不引用自从它进入ExecutorService队列以来的时间，而是引用调用get方法本身的时间。 In other words, it tells how long the get method is allowed to wait, nothing more. 换句话说，它告诉get方法允许等待多长时间，仅此而已。

发送到ExecutorService的作业的运行时间

问题描述

2 个解决方案

解决方案1
2 2013-12-05 10:24:14

解决方案2
1 已采纳 2013-12-05 10:57:01

发送到ExecutorService的作业的运行时间

问题描述

2 个解决方案

解决方案1 2 2013-12-05 10:24:14

解决方案2 1 已采纳 2013-12-05 10:57:01

解决方案1
2 2013-12-05 10:24:14

解决方案2
1 已采纳 2013-12-05 10:57:01