简体   繁体   中英

java connection pool, how many max connections in a multithreaded batch?

I have a Java batch which does a select with a large resulset (I process the elements using a Spring callbackhandler). The callbackhandler puts a task in a fixed threadpool to process the row. My poolsize is fixed on 16 threads. The resulset contains about 100k elements. All db access code is handled through a JdbcTemplate or through Hibernate/Spring, no manual connection management is present. I have tried with Atomikos and with Commons DBCP as connection pool.

Now, I would think that 17 max connections in my connectionpool would be enough to get this batch to finish. One for the select and 16 for the threads in the connectionpool which update some rows. However that seems to be too naive, as I have to specify a max pool size a magnitude larger (haven't tried for an exact value), first I tried 50 which worked on my local Windows machine, but doesn't seem to be enough on our Unix test environment. There I have to specify 128 to make it work (again, I didn't even try a value between 50 and 128, I went straight to 128).

Is this normal? Is there some fundamental mechanism in connection pooling I'm missing? I find it hard to debug this as I don't know how to see what happens with the opened connections. I tried various log4j settings but didn't get any satisfactory result.

edit, additional info: when the connectionpool size seems to be too low, the batch seems to hang. If I do a jstat on the process I can see all threads are waiting for a new connection. At first I didn't specify the maxWait property on the dbcp connection pool, which causes threads to wait indefinitely on a new connection, and I noticed the batch kept hanging. So no connections were released. However, that only happened after processing +-70k rows, which dismissed my initial hunch of connection leakage somehow.

edit2: I forgot to mention I already rewrote the update part in my tasks. I qeueu my updates in a ConcurrentLinkedQueue, I empty that on 1000 elements. So i actually only do about 100 updates.

edit3: I'm using Oracle and I am using the concurrent utils. So i have an executor configured with fixed poolsize of 16. I submit my tasks on this executor. I don't use connections manually in my tasks, I use jdbctemplate which is threadsafe and asks it connections from the connectionpool. I suppose Spring/DBCP handles the connection/thread issue.

If you are using linux, you can try MySql administrator to monitor you connection status graphically, provided you are using MySQL.

Irrespective of that, even 100 connections is not uncommon for large enterprise applications, handling a few thousand requests per minute.

But if the requests are low or each request doesnt need unique a transaction, then I would recommend you to tune your operation inside threads.

That is, how are you distributing the 100k elements to 16 threads? If you try to acquire the connection every time you read a row from the shared location(or buffer), then it is expected to take time.

See whether this helps.

  1. getConnection
  2. for each element until the buffer size becomes zero
  3. process it.
  4. if you need to update,
  5. open a transaction
  6. update
  7. commit/rollback the transaction
  8. go to step 2
  9. release the connection

you can synchronize the buffer by using java.util.concurrent collections

Dont use one Runnable/Callable for each element. This will degrade the performance. Also how are you creating threads? use Executors to run your runnable/callable. Also remember that DB connections are NOT expected to be shared across threads. So use 1 connection in 1 thread at a time.

For eg. create an Executor and submit 16 runnalbles each having its own connection.

I switched to c3p0 instead of DBCP. In c3p0 you can specify a number of helper threads. I notice if I put that number as high as the number of threads I'm using, the number of connections stays really low (using the handy jmx bean of c3p0 to inspect the active connections). Also, I have several dependencies with each its own entity manager. Apparently a new connection is needed for each entity manager, so I have about 4 entitymanagers/thread, which would explain the high number of connections. I think my tasks are all so short-lived that DBCP couldn't follow with closing/releasing connections, since c3p0 works more asynchronous and you can specify the number of helperthreads, it is able to release my connections in time.

edit: but the batch keeps hanging when deployed to the test environment, all threads are blocking when releasing the connection, the lock is on the pool. Just the same as with DBPC :(

edit: all my problems dissapeared when I switched to BoneCP, and I got a huge performance increase as bonus too

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM