简体   繁体   中英

Elegant/efficient way reading millions of records in MySQL Database, Java

I have a MySQL database with ~8.000.000 records. Since I need to process them all I use a BlockingQueue which as Producer reads from the database and puts 1000 records in a queue. The Consumer is the processor that takes records from the queue.

I am writing this in Java, however I'm stuck to figure out how I can (in a clean, elegant way) read from my database and 'suspend' reading once the BlockingQueue is full. After this the control is being handed to the Consumer until there are free spots available again in the BlockingQueue. From here on the Producer should continue reading in records from the database.

Is it clean/elegant/efficient keeping my database connection open inorder for it to continuously read? Or should, once the control is shifted from Producer to Consumer, close the connection, store the id of the record read so far and later open the connection and start reading from that id? The latter seems to me not really good since my database will have to open/close a lot! However, the former is not so elegant in my opinion either?

With persistent connections:

  • You cannot build transaction processing effectively
  • Impossible user sessions on the same connection
  • The applications are not scalable.
  • With time you may need to extend it and it will require management/tracking of persistent connections
  • If the script, for whatever reason, could not release the lock on the table, then any following scripts will block indefinitely and one should restart the db server.
  • Using transactions, transaction block will also pass to the next script (using the same connection) if script execution ends before the transaction block completes, etc.

Persistent connections do not bring anything that you can do with non-persistent connections.
Then, why to use them, at all?

The only possible reason is performance , to use them when overhead of creating a link to your MySQL Server is high. And this depends on many factors like:

  • Database type
  • Whether MySQL server is on the same machine and, if not, how far? might be out of your local network /domain?
  • How much overloaded by other processes the machine on which MySQL sits

One always can replace persistent connections with non-persistent connections. It might change the performance of the script, but not its behavior!

Commercial RDBMS might be licensed by the number of concurrent opened connections and here the persistent connections can mis serve.

If you are using a bounded BlockingQueue by passing a capacity value in the constructor, then the producer will block when it attempts to call put() until the consumer removes an item by calling take() .

It would help to know more about when or how the program is going to execute to decide how to deal with database connections. Some easy choices are: have the producer and all consumers get an individual connection, have a connection pool for all consumers while the producer holds the a connection, or have all producers and consumers use a connection pool.

You can facilitate minimizing the number of connections by using something such as Spring to manage your connection pool and transactions; however, it would only be necessary in some execution situations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM