简体   繁体   中英

Thread dump showing Runnable state, but its hung for quite a long time

We are facing an unusual problem in our application, in the last one month our application reached an unrecoverable state, It was recovered post application restart.

Background : Our application makes a DB query to fetch some information and this Database is hosted on a separate node.

Problematic case : When the thread dump was analyzed we see all the threads are in runnable state fetching the data from the database, but it didn't finished even after 20 minutes.

Post the application restart as expected all threads recovered. And the CPU usage was also normal.

Below is the thread dump

ThreadPool:2:47" prio=3 tid=0x0000000007334000 nid=0x5f runnable [0xfffffd7fe9f54000] java.lang.Thread.State: RUNNABLE at oracle.jdbc.driver.T2CStatement.t2cParseExecuteDescribe(Native Method) at oracle.jdbc.driver.T2CPreparedStatement.executeForDescribe(T2CPreparedStatement.java:518) at oracle.jdbc.driver.T2CPreparedStatement.executeForRows(T2CPreparedStatement.java:764) at ora

All threads in the same state.

Questions:

  1. what could be the reason for this state?
  2. how to recover under this case ?

As others mentioned already, that native methods are always in runnable, as the JVM doesn't know/care about them.

The Oracle drivers on the client side have no socket timeout by default. This means if you have network issues, the client's low level socket may "stuck" there for ever, resulting in a maxxed out connection pool. You could also check the network trafic towards the Oracle server to see if it even transmits data or not.

When using the thin client, you can set oracle.jdbc.ReadTimeout , but I don't know how to do that for the thick (oci) client you use, I'm not familiar with it.

What to do? Research how can you specify read timeout for the thick ojdbc driver, and watch for exceptions related to the connection timeout, that will clearly signal network issues. If you can change the source, you can wrap the calls and retry the session when you catch timeout-related SQLExceptions.

To quickly address the issue, terminate the connection on the Oracle server manually.

Worth checking the session contention, maybe a query blocks these sessions. If you find one, you'll see which database object causes the problem.

It's probably waiting for network data from the database server. Java threads waiting (blocked) on I/O are described by the JVM as being in the state RUNNABLE even though from the program's point of view they're blocked.

Is the system or JVM getting hanged? If configurable and if possible, reduce the number of threads/ parallel connections.

The thread simply waste CPU cycles when waiting for IO. Yes your CPU is unfortunately kept busy by the threads who are awaiting a response from DB.

  1. Does your code manually handle transaction? If then, maybe some of the code didn't commit() after changing data. Or maybe someone ran data modification query directly through PLSQL or something and didn't commit, and that leads all reading operation to be hung.

  2. When you experienced that "hung" and DB has recovered from the status, did you check the data if some of them were rolled back? Asking this since you said "It was recovered post application restart.". It's happening when JDBC driver changed stuff but didn't commit, and timeout happened... DB operation will be rolled back. ( can be different based on the configuration though )

Native methods remain always in RUNNABLE state (ok, unless you change the state from the native method, itself, but this doesn't count).

The method can be blocked on IO, any other event waiting or just long cpu intense task... or endless loop. You can make your own pick.

how to recover under this case ?

drop the connection from oracle.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM