简体   繁体   English

ExecutorService变慢,让我的电脑陷入困境

[英]ExecutorService slows down , bogs down my pc

I am writing a parser for a website , it has many pages (I call them IndexPages) . 我正在为网站编写解析器,它有很多页面(我称之为IndexPages)。 Each page has a lot of links (about 300 to 400 links in an IndexPage). 每个页面都有很多链接(在IndexPage中大约有300到400个链接)。 I use Java's ExecutorService to invoke 12 Callables concurrently in one IndexPage. 我使用Java的ExecutorService在一个IndexPage中同时调用12个Callables Each Callable just fire a http request to one link and do some parsing and db storing actions. 每个Callable只向一个链接发出一个http请求,并进行一些解析和数据库存储操作。 When first IndexPage finished , program progresses to second IndexPage , until no next IndexPage found. 当第一个IndexPage完成时,程序进入第二个IndexPage,直到找不到下一个IndexPage。

解析第1页 When running , it seems OK , I can observe the threads working/scheduling well. 在运行时,似乎没问题,我可以很好地观察线程的工作/调度。 Each link's parsing/storing just takes about 1 to 2 seconds. 每个链接的解析/存储只需要大约1到2秒。

运行2个小时后 But as time goes by , I observed each Callable (parsing/storing) takes longer and longer. 但随着时间的推移,我观察到每个Callable (解析/存储)花费的时间越来越长。 Take this picture for example , sometimes it takes 10 or more seconds to finish a Callable (The green bar is RUNNING , the purple bar is WAITING). 以此图片为例,有时需要10秒或更长时间才能完成Callable (绿色条是RUNNING,紫色条是WAITING)。 And my PC is bogging down , everything becomes sluggish. 而我的电脑正在陷入困境,一切都变得迟钝。

This is my main algorithm : 这是我的主要算法:

ExecutorService executorService = Executors.newFixedThreadPool(12);    
String indexUrl = // Set initial (1st page) IndexPage
while(true)
{
  String nextPage = // parse next page in the indexUrl

  Set<Callable<Void>> callables = new HashSet<>();
  for(String url : getUrls(indexUrl))
  {
    Callable callable = new ParserCallable(url , … and some DAOs);
    callables.add(callable);
  } 

  try {
    executorService.invokeAll(callables);
  } catch (InterruptedException e) {
    e.printStackTrace();
  }

  if (nextPage == null) 
    break;

  indexUrl = nextPage;
} // true
executorService.shutdown();

The algorithm is simple and self-explanatory. 该算法简单且不言自明。 I wonder what may cause such situation ? 我想知道是什么原因导致这种情况? Anyway to prevent such performance degradation ? 无论如何要防止这种性能下降?

CPU /内存 The CPU/Memory/Heap shows reasonable usage. CPU /内存/堆显示合理的使用情况。

环境 Environments , FYI. 环境,仅供参考。

==================== updated ==================== ====================更新====================

I've change my implementations from ExecutorService to ForkJoinPool : 我将我的实现从ExecutorService更改为ForkJoinPool

ForkJoinPool pool=new ForkJoinPool(12);
String indexUrl = // Set initial (1st page) IndexPage
while(true)
{
  Set<Callable<Void>> callables = new HashSet<>();
  for(String url : for(String url : getUrls(indexUrl)))
  {
    Callable callable = new ParserCallable(url , DAOs...);
    callables.add(callable);
  }
  pool.invokeAll(callables);

  String nextPage = // parse next page in this indexUrl
  if (nextPage == null)
    break;

  indexUrl = nextPage;
} // true

It takes longer than ExecutorService 's solution. 它需要比ExecutorService的解决方案更长的时间。 ExecutorService takes about 2 hours to finish all pages , while ForkJoinPool takes 3 hours , and each Callable still takes longer and longer time to complete (from 1 sec to 5,6 or even 10 seconds). ExecutorService大约需要2个小时才能完成所有页面,而ForkJoinPool需要3个小时,而每个Callable仍需要更长的时间来完成(从1秒到5,6甚至10秒)。 I don't mind it takes longer , I just hope it takes constant time (not longer and longer) to finish a job . 我不介意它需要更长的时间,我只希望完成一份工作需要不间断的时间(不再长久)。

I am wondering if I create a lot of (non-thread-safe) GregorianCalendar , Date and SimpleDateFormat objects in the parser and cause some thread issue. 我想知道我是否在解析器中创建了很多(非线程安全的) GregorianCalendarDateSimpleDateFormat对象并导致一些线程问题。 But I didn't reuse these objects or pass them among threads. 但我没有重用这些对象或在线程之间传递它们。 So I still cannot find the reason. 所以我还是找不到原因。

Based on the heap you have a memory issue. 基于堆,您有内存问题。 ExecutorService.invokeAll collects all of the results of the Callable instances into a List and returns that List when they all complete. ExecutorService.invokeAllCallable实例的所有结果收集到List并在它们全部完成时返回该List You may want to consider simply calling ExecutorService.submit since you don't seem to care about the results of each Callable . 您可能想要考虑简单地调用ExecutorService.submit因为您似乎并不关心每个Callable的结果。

I can't see why there is need of Callable to parse your index pages since your 'Caller' method does not expect any result from ParserCallable. 我无法理解为什么需要Callable来解析索引页面,因为你的'Caller'方法不期望ParserCallable产生任何结果。 I could see you would need to bit Exception handling,but still it can be managed with Runnable . 我可以看到你需要对异常处理进行排序,但仍然可以使用Runnable进行管理。

When you use Callable.call() it would return FutureTask back ,which is never used. 当你使用Callable.call() ,它将返回FutureTask,这是从未使用过的。

You should be able to improve implementation by using Runnable which could avoid this additional operation 您应该能够通过使用Runnable来改进实现,这可以避免这种额外的操作

ExecutorService executor = Executors.newFixedThreadPool(12);
for(String url : getUrls(indexUrl))  {
  Runnable worker = new ParserRunnable(url , … and some DAOs);
  executor.execute(worker);
}

class ParserRunnable implements Runnable{
}

As I understand it, if you have 40 pages, each with ~300 URLs, you will create ~12,000 Callables? 据我了解,如果你有40个页面,每个页面有大约300个URL,你将创建~12,000个Callables? While that it probably not too many Callables, it is a lot of HTTPConnections and Database Connections. 虽然可能没有太多的Callables,但它有很多HTTP连接和数据库连接。

I think you should try using one Callable per page. 我想你应该尝试每页使用一个Callable。 You'll still gain a ton by running them in parallel. 通过并行运行它们仍然可以获得很多。 I don't know what you are using for the HTTP request, but you might be able to reuse system resources there instead of opening and closing 12,000 of them. 我不知道您使用的是什么用于HTTP请求,但您可以在那里重用系统资源,而不是打开和关闭其中的12,000个。

And especially for the DB. 特别是对于DB。 You'll have just 40 connections. 你将只有40个连接。 You might even be able to be super efficient by collecting the ~300 records locally, then using a batch update. 您甚至可以通过在本地收集~300条记录,然后使用批量更新来提高效率。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM