简体繁体 English

MySQL（5.6）是否总是将整个查询结果集缓存在内存/磁盘上？

[英]Does MySQL (5.6) always cache the entire result set for a query in memory/on disk?

原文 2018-10-22 07:43:17 2 2 mysql/ database

I need to dump a few very large MySQL tables to csv (hosted on RDS so no SELECT INTO OUTFILE). 我需要将一些非常大的MySQL表转储到csv（托管在RDS上，所以没有SELECT INTO OUTFILE）。 These tables are far larger than the available memory on their server. 这些表远远大于其服务器上的可用内存。

If I execute a SELECT * FROM a_big_table using a python framework with fetchmany() or fetchone() to grab the records, will MySQL 5.6 try to read the entire table into memory first (which I expect will result in caching to disk), or is it smarter than that? 如果我使用带有fetchmany()或fetchone()的python框架执行SELECT * FROM a_big_table来获取记录， MySQL 5.6尝试首先将整个表读入内存（我希望这会导致缓存到磁盘），或者比这聪明吗？

EDIT: To clarify, I mean will the whole result set be stored in MySQL cache (not Python!). 编辑：为了澄清，我的意思是将整个结果集存储在MySQL缓存中（而不是Python！）。

2nd EDIT: Changed typo “sorted” to “stored” in first edit. 第二次编辑：在第一次编辑中将“排序”错字更改为“存储”。 Comments still useful regarding this case tho! 注释对于这种情况仍然有用！

2 个解决方案

The used amount of memory used on the server is defined by buffer pool size configuration setting. 服务器上已使用的内存量由缓冲池大小配置设置定义。 There's hardly any need to worry about what's happening on the server side. 几乎无需担心服务器端发生了什么。 Your fetching application will probably be the bottleneck and thus be able to write the dump slower than MySQL can output. 您的提取应用程序可能会成为瓶颈，因此写入转储的速度可能比MySQL输出的速度慢。 Server just takes care of filling the buffer while you fetch. 服务器在获取数据时只负责填充缓冲区。 Fetching one larger result set from server's standpoint is more efficient and less resource demanding than making multiple smaller range queries... 从服务器的角度来看，获取一个更大的结果集比进行多个较小范围的查询更有效，对资源的需求也更少。

Typically in app level database calls, the entire result set is not returned, but rather a cursor into the result set is returned. 通常，在应用程序级别的数据库调用中，不会返回整个结果集，而是会返回指向结果集的游标。 It is then up to the app language (eg Python) to iterate that result set and retrieve the records. 然后由应用程序语言（例如Python）来迭代该结果集并检索记录。

The documentation for MySQL's Python connector confirms this: MySQL的Python连接器的文档确认了这一点：

By default, MySQL Connector/Python does not buffer or prefetch results. 默认情况下，MySQL Connector / Python不缓冲或预取结果。 This means that after a query is executed, your program is responsible for fetching the data (emphasis mine). 这意味着执行查询后，您的程序将负责获取数据 （重点是我的）。 This avoids excessive memory use when queries return large result sets. 当查询返回大结果集时，这避免了过多的内存使用。 If you know that the result set is small enough to handle all at once, you can fetch the results immediately by setting buffered to True. 如果您知道结果集足够小以至于可以一次处理所有内容，则可以通过将buffered设置为True来立即获取结果。 It is also possible to set this per cursor (see Section 10.2.6, “MySQLConnection.cursor() Method”). 也可以为每个游标设置此设置（请参见第10.2.6节“ MySQLConnection.cursor（）方法”）。

Results generated by queries normally are not read until the client program fetches them. 在客户端程序获取查询结果之前，通常不会读取查询生成的结果。 To automatically consume and discard result sets, set the consume_results option to True. 要自动使用和丢弃结果集，请将consume_results选项设置为True。 The result is that all results are read, which for large result sets can be slow. 结果是读取了所有结果，这对于大型结果集可能很慢。 (In this case, it might be preferable to close and reopen the connection.) （在这种情况下，最好关闭并重新打开连接。）

So, your strategy of using a SELECT * query, then writing to file one record at a time, or groups of records at a time, should work from a memory requirements point of view. 因此，从内存需求的角度来看，使用SELECT *查询，然后一次写入一个记录或一次写入记录组的策略应该可行。 Your Python code should only need as much memory to hold the current record(s) you are trying to write to file. 您的Python代码仅需要尽可能多的内存来保存您尝试写入文件的当前记录。