简体   繁体   English

优雅/高效的方式读取MySQL数据库,Java中的数百万条记录

[英]Elegant/efficient way reading millions of records in MySQL Database, Java

I have a MySQL database with ~8.000.000 records. 我有一个带有〜8.000.000条记录的MySQL数据库。 Since I need to process them all I use a BlockingQueue which as Producer reads from the database and puts 1000 records in a queue. 由于我需要处理它们,因此我使用了BlockingQueue,生产者从数据库中读取该队列并将1000条记录放入队列中。 The Consumer is the processor that takes records from the queue. 使用者是从队列中获取记录的处理器。

I am writing this in Java, however I'm stuck to figure out how I can (in a clean, elegant way) read from my database and 'suspend' reading once the BlockingQueue is full. 我正在用Java编写此代码,但是我仍然想弄清楚如何(以一种简洁,优雅的方式)从数据库中读取数据,并在BlockingQueue已满时“挂起”读取数据。 After this the control is being handed to the Consumer until there are free spots available again in the BlockingQueue. 此后,将控制权移交给使用者,直到BlockingQueue中再次有可用的可用位。 From here on the Producer should continue reading in records from the database. 从这里开始,生产者应继续从数据库中读取记录。

Is it clean/elegant/efficient keeping my database connection open inorder for it to continuously read? 保持数据库连接开放以便连续读取是否干净/优雅/高效? Or should, once the control is shifted from Producer to Consumer, close the connection, store the id of the record read so far and later open the connection and start reading from that id? 还是应该在控件从生产者转移到消费者后,关闭连接,存储到目前为止已读取记录的ID,然后打开连接并从该ID开始读取? The latter seems to me not really good since my database will have to open/close a lot! 在我看来,后者似乎不太好,因为我的数据库将不得不大量打开/关闭! However, the former is not so elegant in my opinion either? 但是,我认为前者也不是那么优雅吗?

With persistent connections: 与持久连接:

  • You cannot build transaction processing effectively 您无法有效地建立事务处理
  • Impossible user sessions on the same connection 同一连接上不可能的用户会话
  • The applications are not scalable. 这些应用程序不可扩展。
  • With time you may need to extend it and it will require management/tracking of persistent connections 随着时间的流逝,您可能需要扩展它,这将需要对持久性连接进行管理/跟踪
  • If the script, for whatever reason, could not release the lock on the table, then any following scripts will block indefinitely and one should restart the db server. 如果该脚本由于某种原因无法释放表上的锁,则随后的任何脚本将无限期地阻塞,因此应重新启动数据库服务器。
  • Using transactions, transaction block will also pass to the next script (using the same connection) if script execution ends before the transaction block completes, etc. 使用事务,如果脚本在事务块完成之前结束执行,则事务块还将传递到下一个脚本(使用相同的连接)。

Persistent connections do not bring anything that you can do with non-persistent connections. 持久连接不会带来非持久连接可以做的任何事情。
Then, why to use them, at all? 那么,为什么要使用它们呢?

The only possible reason is performance , to use them when overhead of creating a link to your MySQL Server is high. 唯一可能的原因是Performance ,当创建指向MySQL Server的链接的开销很高时,可以使用它们。 And this depends on many factors like: 这取决于许多因素,例如:

  • Database type 数据库类型
  • Whether MySQL server is on the same machine and, if not, how far? MySQL服务器是否在同一台计算机上,如果不在同一台计算机上,距离多远? might be out of your local network /domain? 可能不在您的本地网络/域之外?
  • How much overloaded by other processes the machine on which MySQL sits MySQL所在的机器在其他进程上有多少重载

One always can replace persistent connections with non-persistent connections. 一个总是可以用非持久连接替换持久连接。 It might change the performance of the script, but not its behavior! 它可能会更改脚本的性能,但不会更改其行为!

Commercial RDBMS might be licensed by the number of concurrent opened connections and here the persistent connections can mis serve. 商用RDBMS可能会根据并发打开的连接数获得许可,因此持久性连接可能会误用。

If you are using a bounded BlockingQueue by passing a capacity value in the constructor, then the producer will block when it attempts to call put() until the consumer removes an item by calling take() . 如果通过在构造函数中传递容量值来使用有界的BlockingQueue ,则生产者将在尝试调用put()时阻塞,直到消费者通过调用take()删除项目为止。

It would help to know more about when or how the program is going to execute to decide how to deal with database connections. 这将有助于更多地了解程序何时或如何执行来决定如何处理数据库连接。 Some easy choices are: have the producer and all consumers get an individual connection, have a connection pool for all consumers while the producer holds the a connection, or have all producers and consumers use a connection pool. 一些简单的选择是:让生产者和所有使用者获得一个单独的连接,为生产者持有该连接的所有使用者提供一个连接池,或者让所有生产者和使用者使用一个连接池。

You can facilitate minimizing the number of connections by using something such as Spring to manage your connection pool and transactions; 您可以通过使用诸如Spring来管理连接池和事务来简化连接数量。 however, it would only be necessary in some execution situations. 但是,仅在某些执行情况下才需要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM