简体   繁体   English

Java-MySQL高负载应用程序崩溃

[英]Java-mysql highload application crash

I have a problem with my html-scraper. 我的html刮板有问题。 Html-scraper is multithreading application written on Java using HtmlUnit, by default it run with 128 threads. Html-scraper是使用HtmlUnit用Java编写的多线程应用程序,默认情况下它以128个线程运行。 Shortly, it works as follows: it takes a site url from big text file, ping url and if it is accessible - parse site, find specific html blocks, save all url and blocks info including html code into corresponding tables in database and go to the next site. 简而言之,它的工作方式如下:从大文本文件中获取站点url,ping url,如果可以访问-解析站点,找到特定的html块,将所有url和块信息(包括html代码)保存到数据库中相应的表中,然后转到下一个站点。 Database is mysql 5.1, there are 4 InnoDb tables and 4 views. 数据库是mysql 5.1,有4个InnoDb表和4个视图。 Tables have numeric indexes for fields used in table joining. 表具有用于表连接的字段的数字索引。 I also has a web-interface for browsing and searching parsed data (for searching I use Sphinx with delta indexes), written on CodeIgniter. 我还有一个Web界面,用于浏览和搜索解析后的数据(对于搜索,我将Sphinx与增量索引一起使用),写在CodeIgniter上。

Server configuration: 服务器配置:

CPU: Type Xeon Quad Core X3440 2.53GHz
RAM: 4 GB
HDD: 1TB SATA
OS: Ubuntu Server 10.04

Some mysql config: 一些mysql配置:

key_buffer = 256M
max_allowed_packet = 16M
thread_stack = 192K
thread_cache_size = 128
max_connections = 400
table_cache = 64
query_cache_limit = 2M
query_cache_size = 128M

Java machine run with default parameters except next options: Java机器以默认参数运行,但下一个选项除外:

-Xms1024m -Xmx1536m -XX:-UseGCOverheadLimit -XX:NewSize=500m -XX:MaxNewSize=500m -XX:SurvivorRatio=6 -XX:PermSize=128M -XX:MaxPermSize=128m -XX:ErrorFile=/var/log/java/hs_err_pid_%p.log

When database was empty, scrapper process 18 urls in second and was stable enough. 当数据库为空时,抓取器每秒处理18个URL,并且足够稳定。 But after 2 weaks, when urls table contains 384929 records (~25% of all processed urls) and takes 8.2Gb, java application begun work very slowly and crash every 1-2 minutes. 但是经过2次弱化后,当urls表包含384929条记录(占所有已处理url的25%)并占用8.2Gb时,java应用程序开始运行非常缓慢,每1-2分钟崩溃一次。 I guess the reason is mysql, that can not handle growing loading (parser, which perform 2+4*BLOCK_NUMBER queries every processed url; sphinx, which updating delta indexes every 10 minutes; I don't consider web-interface, because it's used by only one person), maybe it rebuild indexes very slowly? 我猜想原因是mysql,无法处理不断增长的负载(解析器,每个处理的url执行2+4*BLOCK_NUMBER查询; sphinx,每10分钟更新一次增量索引;我不考虑网络界面,因为使用了它仅一个人),也许它重建索引的速度很慢? But mysql and scraper logs (which also contain all uncaught exceptions) are empty. 但是mysql和scraper日志(也包含所有未捕获的异常)为空。 What do you think about it? 你怎么看待这件事?

I'd recommend running the following just to check a few status things.. puting that output here would help as well: 我建议运行以下命令只是为了检查一些状态。.将输出放在这里也会有所帮助:

  1. dmesg
  2. top Check the resident vs virtual memory per processes top检查每个进程的驻留内存与虚拟内存

So the application become non responsive? 因此,应用程序变得无响应了吗? (Not the same as a crash at all) I would check all your resources are free. (完全不同于崩溃)我将检查您所有的资源是否可用。 eg do a jstack to check if any threads are tied up. 例如做一个jstack检查是否有任何线程被捆绑。

Check in MySQL you have the expect number of connections. 在MySQL中签入您期望的连接数。 If you continuously create connections in Java and don't clean them up the database will run slower and slower. 如果您用Java不断创建连接并且不清理它们,则数据库的运行速度将越来越慢。

Thank you all for your advice, mysql was actually cause of the problem. 谢谢大家的建议,mysql实际上是问题的原因。 By enabling slow query log in my.conf I see that one of the queries, which executes every iteration, performs 300s (1 field for searching was not indexed). 通过在my.conf中启用慢速查询日志,我看到执行每次迭代的查询之一执行300s(未索引1个用于搜索的字段)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM