简体   繁体   English

MySQL/Hibernate - 如何调试不断下降的 MySQL 池连接?

[英]MySQL/Hibernate - How do I debug a MySQL pooled connection that keeps dropping?

For months, my web application ran smoothly, but for the past week or two, it keeps dropping its connection to MySQL server.几个月来,我的 Web 应用程序运行顺利,但在过去的一两周里,它不断断开与 MySQL 服务器的连接。 I'm not a DBA guy and have no idea how to debug this.我不是 DBA 人员,也不知道如何调试。

Here is what I know:这是我所知道的:

  1. The connection seems to drop every few hours.连接似乎每隔几个小时就会断开一次。 Sometimes during the day, but always during the night.有时在白天,但总是在晚上。
  2. My lab has a MySQL server that hosts databases for multiple applications.我的实验室有一个 MySQL 服务器,它为多个应用程序托管数据库。
  3. Currently, we have 46 connections to the MySQL server.目前,我们有 46 个连接到 MySQL 服务器。
  4. To my knowledge, no other application is experiencing this issue.据我所知,没有其他应用程序遇到此问题。
  5. My application is using the same stack, configuration, and even code for connecting to the DB as another application—this other application supports around 200 users per day and has been running smoothly since 2013.我的应用程序使用与另一个应用程序相同的堆栈、配置甚至代码来连接到数据库——这个另一个应用程序每天支持大约 200 个用户,并且自 2013 年以来一直在顺利运行。
  6. Both applications use Hibernate ORM;两个应用程序都使用 Hibernate ORM; this is the only configuration that I know of:这是我所知道的唯一配置:

     <!-- TomcatJDBCConnectionProvider class is common to both applications --> <property name="hibernate.connection.provider_class">org.hibernate.connection.TomcatJDBCConnectionProvider</property> <property name="hibernate.dialect">org.hibernate.dialect.MySQLDialect</property> <property name="hibernate.connection.driver_class">com.mysql.jdbc.Driver</property> <property name="hibernate.connection.pool_size">5</property> <property name="hibernate.current_session_context_class">thread</property> <property name="hibernate.tomcatJdbcPool.validationQuery">SELECT 1</property> <property name="hibernate.tomcatJdbcPool.testOnBorrow">true</property> <property name="hibernate.enable_lazy_load_no_trans">true</property>
  7. The issue started around the same time as when someone tried to use the application's RESTful API to download our data.该问题大约在有人尝试使用应用程序的 RESTful API 下载我们的数据时出现。 This user—actually a collaborator—has a small script iterates over every row in a specific table and requests all the metadata.这个用户——实际上是一个合作者——有一个小脚本,它遍历特定​​表中的每一行并请求所有元数据。

  8. The issue also started around the same time that my lab started offering a Coursera Massive Open Online Course.这个问题也是在我的实验室开始提供 Coursera 大规模开放在线课程的同时开始的。 I don't know what the numbers are, but the actual usage on the site must have jumped.我不知道这些数字是多少,但网站上的实际使用量肯定有所增加。

I'm aware that this is a broad question, but I'm really at a loss as to how to go about debugging this.我知道这是一个广泛的问题,但我真的不知道如何进行调试。 Any suggestions are appreciated.任何建议表示赞赏。

EDIT:编辑:

Digging around the other application's ServletContextListener , I found this bit of code that my contextDestroyed function does not have:挖掘其他应用程序的ServletContextListener ,我发现了我的contextDestroyed函数没有的这段代码:

// TODO: Find memory leak that requires server to be restarted after hot deploying several (3?) times.
Set<Thread> threadSet = Thread.getAllStackTraces().keySet();
for (Thread t : threadSet) {
    if (t.getName().contains("Abandoned connection cleanup thread")) {
        synchronized (t) {
            System.out.println("Forcibly stopping thread to avoid memory leak: " + t.getName());
            t.stop(); // don't complain, it works
        }
    }
}

It appears to iterate over the stack traces, find the one with the text "Abandoned connection cleanup thread" and manually stop it.它似乎遍历堆栈跟踪,找到带有文本"Abandoned connection cleanup thread"那个并手动停止它。 It seems probably that this is related to my issue?似乎这可能与我的问题有关?

EDIT 21/9/2015:编辑 21/9/2015:

My application went down this weekend.这个周末我的申请失败了。 Here is the stack trace from the error log from yesterday (when I believe it went down):这是昨天错误日志中的堆栈跟踪(当我相信它已经下降时):

20-Sep-2015 14:22:18.160 SEVERE [http-apr-8080-exec-35] org.apache.catalina.core.StandardWrapperValve.invoke Servlet.service() for servlet [edu.mssm.pharm.maayanlab.Harmonizome.api.GeneMetadataApi] in context with path [/Harmonizome] threw exception
 org.hibernate.exception.GenericJDBCException: Could not open connection
    at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:54)
    at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:125)
    at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:110)
    at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.obtainConnection(LogicalConnectionImpl.java:304)
    at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.getConnection(LogicalConnectionImpl.java:169)
    at org.hibernate.engine.transaction.internal.jdbc.JdbcTransaction.doBegin(JdbcTransaction.java:67)
    at org.hibernate.engine.transaction.spi.AbstractTransactionImpl.begin(AbstractTransactionImpl.java:160)
    at org.hibernate.internal.SessionImpl.beginTransaction(SessionImpl.java:1395)
    at org.hibernate.collection.internal.AbstractPersistentCollection.withTemporarySessionIfNeeded(AbstractPersistentCollection.java:224)
    at org.hibernate.collection.internal.AbstractPersistentCollection.initialize(AbstractPersistentCollection.java:545)
    at org.hibernate.collection.internal.AbstractPersistentCollection.read(AbstractPersistentCollection.java:124)
    at org.hibernate.collection.internal.PersistentSet.iterator(PersistentSet.java:180)
    at edu.mssm.pharm.maayanlab.Harmonizome.json.serdes.GeneMetadataSerializer.serialize(GeneMetadataSerializer.java:54)
    at edu.mssm.pharm.maayanlab.Harmonizome.json.serdes.GeneMetadataSerializer.serialize(GeneMetadataSerializer.java:23)
    at com.google.gson.TreeTypeAdapter.write(TreeTypeAdapter.java:70)
    at com.google.gson.Gson.toJson(Gson.java:600)
    at com.google.gson.Gson.toJson(Gson.java:579)
    at com.google.gson.Gson.toJson(Gson.java:534)
    at edu.mssm.pharm.maayanlab.Harmonizome.api.GeneMetadataApi.doGet(GeneMetadataApi.java:65)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:622)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:291)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
    at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
    at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
    at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
    at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:502)
    at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:142)
    at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
    at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:616)
    at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
    at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:518)
    at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1091)
    at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:673)
    at org.apache.tomcat.util.net.AprEndpoint$SocketWithOptionsProcessor.run(AprEndpoint.java:2440)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-apr-8080-exec-35] Timeout: Pool empty. Unable to fetch a connection in 30 seconds, none available[size:5; busy:5; idle:0; lastwait:30000].
    at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:672)
    at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:186)
    at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:127)
    at org.hibernate.connection.TomcatJDBCConnectionProvider.getConnection(TomcatJDBCConnectionProvider.java:208)
    at org.hibernate.internal.AbstractSessionImpl$NonContextualJdbcConnectionAccess.obtainConnection(AbstractSessionImpl.java:292)
    at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.obtainConnection(LogicalConnectionImpl.java:297)

Here is are my connection variables from MySQL:这是我来自 MySQL 的连接变量:

mysql>  SHOW VARIABLES LIKE '%connect%';
+-----------------------------------------------+-----------------+
| Variable_name                                 | Value           |
+-----------------------------------------------+-----------------+
| character_set_connection                      | utf8            |
| collation_connection                          | utf8_general_ci |
| connect_timeout                               | 5               |
| default_master_connection                     |                 |
| extra_max_connections                         | 1               |
| init_connect                                  |                 |
| max_connect_errors                            | 100             |
| max_connections                               | 100             |
| max_user_connections                          | 0               |
| performance_schema_session_connect_attrs_size | 512             |
+-----------------------------------------------+-----------------+

mysql>  SHOW VARIABLES LIKE '%timeout%';
+-----------------------------+----------+
| Variable_name               | Value    |
+-----------------------------+----------+
| connect_timeout             | 5        |
| deadlock_timeout_long       | 50000000 |
| deadlock_timeout_short      | 10000    |
| delayed_insert_timeout      | 300      |
| innodb_flush_log_at_timeout | 1        |
| innodb_lock_wait_timeout    | 50       |
| innodb_rollback_on_timeout  | OFF      |
| interactive_timeout         | 28800    |
| lock_wait_timeout           | 31536000 |
| net_read_timeout            | 30       |
| net_write_timeout           | 60       |
| slave_net_timeout           | 3600     |
| thread_pool_idle_timeout    | 60       |
| wait_timeout                | 28800    |
+-----------------------------+----------+

EDIT 22/9/2015:编辑 22/9/2015:

Would a SEVERE Tomcat error cause the issue? SEVERE Tomcat 错误会导致问题吗? I am seeing an error, unrelated to the database, about parsing a date:我看到一个关于解析日期的错误,与数据库无关:

22-Sep-2015 10:09:53.481 SEVERE [http-apr-8080-exec-26] org.apache.catalina.core.StandardWrapperValve.invoke Servlet.service() for servlet [edu.mssm.pharm.maayanlab.Harmonizome.page.DatasetPage] in context with path [/Harmonizome] threw exception [javax.servlet.ServletException: javax.servlet.jsp.JspException: In &lt;parseDate&gt;, a parse locale can not be established] with root cause
 javax.servlet.jsp.JspException: In &lt;parseDate&gt;, a parse locale can not be established
    at org.apache.taglibs.standard.tag.common.fmt.ParseDateSupport.doEndTag(ParseDateSupport.java:147)

Attaching JConsole output of heap memory usage:附加堆内存使用的 JConsole 输出:

在此处输入图片说明

JConsole output for thread usage;线程使用的 JConsole 输出; it started around 24-25 and jumped up to 34 once I started using the site.它从 24-25 开始,并在我开始使用该网站后跃升至 34。 Even after closing the browser window, it remained there:即使在关闭浏览器窗口后,它仍然存在:

在此处输入图片说明

EDIT 23/9/2015: 23/9/2015 编辑:

One thing I changed right before the issue began was how I deal with Hibernate transactions.在问题开始之前我改变的一件事是我如何处理 Hibernate 事务。 Previously, I had enable_lazy_load_no_trans disabled (which is the default).以前,我禁用了enable_lazy_load_no_trans (这是默认设置)。 Previously, I was using the " open session in view " pattern.以前,我使用“ 在视图中打开会话”模式。 It seemed like people didn't like the open session in view pattern, so I enabled enable_lazy_load_no_trans .似乎人们不喜欢视图模式中的打开会话,所以我启用了enable_lazy_load_no_trans Thus, I have code like this:因此,我有这样的代码:

List<MyObjects> myObjects = null;
try {
    HibernateUtil.beginTransaction();
    myObjects = // fetch my objects from the DB
    HibernateUtil.commitTransaction();
} catch (HibernateException he) {
    HibernateUtil.rollbackTransaction();
} finally {
    HibernateUtil.close();
}

// render myObjects in JSP/JSTL
// this JSP may lazily load related objects

In retrospect, this seems... problematic.回想起来,这似乎……有问题。 I have no idea when Hibernate "lets go" of the objects.我不知道 Hibernate 什么时候“放手”了这些对象。

Hibernate errors are a bit abstract and sometimes it can be tricky to find the bug by the stack trace. Hibernate 错误有点抽象,有时通过堆栈跟踪找到错误可能很棘手。 I think that may be a problem of your application, maybe you're not closing Hibernate connections properly on some cases or your application may have a memory leak.我认为这可能是您的应用程序的问题,也许您在某些情况下没有正确关闭 Hibernate 连接,或者您的应用程序可能存在内存泄漏。

Have you tried to monitor the application with jconsole from the JDK?您是否尝试使用 JDK 中的jconsole监视应用程序?

You can set this on your Tomcat configuration console in the Java arguments (I'm assuming you're using Tomcat), to enable the jconsole您可以在 Tomcat 配置控制台的 Java 参数中进行设置(我假设您使用的是 Tomcat),以启用jconsole

-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=8086
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false

Then connect to a remote process for example然后例如连接到远程进程

localhost:8086 

and watch the threads as you go thru the operations that make the application stop.并在您执行使应用程序停止的操作时观察线程。

Edit编辑

If you're not using Tomcat and you're running your application in a Windows environment you can monitorize the threads using for example Process Explorer and monitorize your application.如果您没有使用 Tomcat 并且您在 Windows 环境中运行您的应用程序,您可以使用例如Process Explorer来监控线程并监控您的应用程序。

From the stack-trace you provided, I can draw a single conclusion: you are simply running out of connections.从您提供的堆栈跟踪中,我可以得出一个结论:您只是连接用完了。

This can be caused by long running transactions, possibly due to slow queries or improper application transaction boundaries.这可能是由长时间运行的事务引起的,可能是由于查询速度慢或应用程序事务边界不正确。

I suggest you start using FlexyPool , which supports Tomcat DBCP and get a better understanding of both the connection and transaction usage.我建议您开始使用FlexyPool ,它支持 Tomcat DBCP 并更好地了解连接和事务使用情况。 FlexyPool provides many histograms you might be interested in, like connection acquisition time and lease time. FlexyPool 提供了许多您可能感兴趣的直方图,例如连接获取时间和租用时间。

An, just to be on the safe side, check the MySQL driver version too and see if you're running on an outdated library. An,为了安全起见,也请检查 MySQL 驱动程序版本,看看您是否在过时的库上运行。

It seems your connection pool cannot return a free connection to Hibernate within timeout duration.似乎您的连接池无法在超时时间内返回到 Hibernate 的空闲连接。 This happens because your application have very long transactions or transaction dead locks.发生这种情况是因为您的应用程序有很长的事务或事务死锁。 You can try following options to fix the bug.您可以尝试以下选项来修复错误。

  1. change your connection pool size in following line在以下行中更改连接池大小

    <property name="hibernate.connection.pool_size">5</property>

make the pool size about 10 and test.使池大小约为 10 并进行测试。 You should keep your eye on the count of connections to your database.您应该密切关注与数据库的连接数。 If it exceeds the mysql database connection limitations change max_connections of mysql server and keep testing.如果超过 mysql 数据库连接限制,请更改 mysql 服务器的max_connections并继续测试。

  1. Use an another connection pool.使用另一个连接池。 I recommend to use apache commons dbcp2.我建议使用 apache commons dbcp2。 Maven dependencies of dbcp2 as follows. dbcp2 的 Maven 依赖如下。

    <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-dbcp2</artifactId> <version>2.1</version> </dependency>

Add dbcp2 into your POM then config dbcp2 with your application.将 dbcp2 添加到您的 POM 中,然后使用您的应用程序配置 dbcp2。

If it was the solution your application had only long transactions.如果这是您的应用程序只有长事务的解决方案。 Sometimes it may minimize the occurrence, and if it is still happening definitely your application have transaction dead locks.有时它可以最大限度地减少发生,如果它仍然发生,你的应用程序肯定有事务死锁。 So you have to identify what are the possible problems with your code.因此,您必须确定您的代码可能存在哪些问题。

There are other alternative solutions such changing the waiting timeout to a higher value.还有其他替代解决方案,例如将等待超时更改为更高的值。 But it is not good for your application performance and it doesn't make any sense for transaction dead locks.但这对您的应用程序性能不利,并且对事务死锁没有任何意义。 Finally you should remember to care about transaction management and database structure in your further developments for better performance of database.最后,您应该记住在进一步开发中关注事务管理和数据库结构,以获得更好的数据库性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM