[英]Elegant/efficient way reading millions of records in MySQL Database, Java
I have a MySQL database with ~8.000.000 records. 我有一个带有〜8.000.000条记录的MySQL数据库。 Since I need to process them all I use a BlockingQueue which as Producer reads from the database and puts 1000 records in a queue. 由于我需要处理它们,因此我使用了BlockingQueue,生产者从数据库中读取该队列并将1000条记录放入队列中。 The Consumer is the processor that takes records from the queue. 使用者是从队列中获取记录的处理器。
I am writing this in Java, however I'm stuck to figure out how I can (in a clean, elegant way) read from my database and 'suspend' reading once the BlockingQueue is full. 我正在用Java编写此代码,但是我仍然想弄清楚如何(以一种简洁,优雅的方式)从数据库中读取数据,并在BlockingQueue已满时“挂起”读取数据。 After this the control is being handed to the Consumer until there are free spots available again in the BlockingQueue. 此后,将控制权移交给使用者,直到BlockingQueue中再次有可用的可用位。 From here on the Producer should continue reading in records from the database. 从这里开始,生产者应继续从数据库中读取记录。
Is it clean/elegant/efficient keeping my database connection open inorder for it to continuously read? 保持数据库连接开放以便连续读取是否干净/优雅/高效? Or should, once the control is shifted from Producer to Consumer, close the connection, store the id of the record read so far and later open the connection and start reading from that id? 还是应该在控件从生产者转移到消费者后,关闭连接,存储到目前为止已读取记录的ID,然后打开连接并从该ID开始读取? The latter seems to me not really good since my database will have to open/close a lot! 在我看来,后者似乎不太好,因为我的数据库将不得不大量打开/关闭! However, the former is not so elegant in my opinion either? 但是,我认为前者也不是那么优雅吗?
With persistent connections: 与持久连接:
Persistent connections do not bring anything that you can do with non-persistent connections. 持久连接不会带来非持久连接可以做的任何事情。
Then, why to use them, at all? 那么,为什么要使用它们呢?
The only possible reason is performance , to use them when overhead of creating a link to your MySQL Server is high. 唯一可能的原因是Performance ,当创建指向MySQL Server的链接的开销很高时,可以使用它们。 And this depends on many factors like: 这取决于许多因素,例如:
One always can replace persistent connections with non-persistent connections. 一个总是可以用非持久连接替换持久连接。 It might change the performance of the script, but not its behavior! 它可能会更改脚本的性能,但不会更改其行为!
Commercial RDBMS might be licensed by the number of concurrent opened connections and here the persistent connections can mis serve. 商用RDBMS可能会根据并发打开的连接数获得许可,因此持久性连接可能会误用。
If you are using a bounded BlockingQueue
by passing a capacity value in the constructor, then the producer will block when it attempts to call put() until the consumer removes an item by calling take() . 如果通过在构造函数中传递容量值来使用有界的BlockingQueue
,则生产者将在尝试调用put()时阻塞,直到消费者通过调用take()删除项目为止。
It would help to know more about when or how the program is going to execute to decide how to deal with database connections. 这将有助于更多地了解程序何时或如何执行来决定如何处理数据库连接。 Some easy choices are: have the producer and all consumers get an individual connection, have a connection pool for all consumers while the producer holds the a connection, or have all producers and consumers use a connection pool. 一些简单的选择是:让生产者和所有使用者获得一个单独的连接,为生产者持有该连接的所有使用者提供一个连接池,或者让所有生产者和使用者使用一个连接池。
You can facilitate minimizing the number of connections by using something such as Spring
to manage your connection pool and transactions; 您可以通过使用诸如Spring
来管理连接池和事务来简化连接数量。 however, it would only be necessary in some execution situations. 但是,仅在某些执行情况下才需要。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.