简体   繁体   中英

Is it worth to parallelize queries with jdbc and mysql?

One jdbc "select" statement takes 5 secs to complete. So doing 5 statements takes 25 secs.

Now I try to do the job in parallel. The db is mysql with innodb. I start 5 threads and give each thread its own db connection. But it still takes 25 secs for all to complete?

Note I provide java with enough heap and have 8 cores but only one hd (maybe having only one hd is the bottleneck here?)

Is this the expected behavour with mysql out of the box? here is example code:

public void doWork(int n) {
        try (Connection conn = pool.getConnection();
             PreparedStatement stmt = conn.prepareStatement("select id from big_table where id between "+(n * 1000000)" and " +(n * 1000000 +1000000));
        ) { 
            try (ResultSet rs = stmt.executeQuery();) {
                while (rs.next()) {
                    Long itemId = rs.getLong("id");
                }
            }
        }
}

public void doWorkBatch() {
    for(int i=1;i<5;i++)
        doWork(i);
}

public void doWorkParrallel() {
    for(int i=1;i<5;i++)
        new Thread(()->doWork(i)).start();
    System.console().readLine();
}

(I don't recall where but I read that a standard mysql installation can easily handle 1000 connections in parallel)

It depends where the bottleneck in your system is... If your queries spend a few seconds each establishing the connection to the database, and only a fraction of that actually running the query, you'd see a nice improvement. However if the time is spent in mysql, running the actual query, you wouldn't see as much of a difference.

The first thing I'd do, rather than trying concurrent execution is to optimize the query, maybe add indices to your tables, and so forth.

Concurrent execution may be faster. You should also consider batch execution.

Concurrent execution will help if there is any room for parallelization. In your case, there seems to be no room for parallelization, because you have a very simple query which performs a sequential read of a huge amount of data, so your bottleneck is probably the disk transfer and then the data transfer from the server to the client.

When we say that RDBMS servers can handle thousands of requests per second we are usually talking about the kind of requests that we usually see in web applications, where each SQL query is slightly more complicated than yours, but results in much smaller disk reads (so they are likely to be found in a cache) and much smaller data transfers (stuff that fit within a web page.)

Looking at your problem definitely multi-threading will improve your performance because even i once converted an 4-5 hours batch job into a 7-10 minute job by doing exactly the same what you're thinking but you need to know the following things before hand while designing :-

1) You need to think about inter-task dependencies ie tasks getting executed on different threads.

2) Using connection pool is a good sign since Creating Database connections are slow process in Java and takes long time.

3) Each thread needs its own JDBC connection. Connections can't be shared between threads because each connection is also a transaction.

4) Cut tasks into several work units where each unit does one job.

5) Particularly for your case, ie using mysql. Which database engine you use would also affect the performance as the InnoDB engine uses row-level locking. This way, it will handle much higher traffic. The (usual) alternative, however, (MyISAM) does not support row-level locking, it uses table locking. i'm talking about the case What if another thread comes in and wants to update the same row before the first thread commits.

6) To improve performance of Java database application is running queries with setAutoCommit(false). By default new JDBC connection has there auto commit mode ON, which means every individual SQL Statement will be executed in its own transaction. while without auto commit you can group SQL statement into logical transaction, which can either be committed or rolled back by calling commit() or rollback().

You can also checkout springbatch which is designed for batch processing.

Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM