简体   繁体   中英

Does MySQL (5.6) always cache the entire result set for a query in memory/on disk?

I need to dump a few very large MySQL tables to csv (hosted on RDS so no SELECT INTO OUTFILE). These tables are far larger than the available memory on their server.

If I execute a SELECT * FROM a_big_table using a python framework with fetchmany() or fetchone() to grab the records, will MySQL 5.6 try to read the entire table into memory first (which I expect will result in caching to disk), or is it smarter than that?

EDIT: To clarify, I mean will the whole result set be stored in MySQL cache (not Python!).

2nd EDIT: Changed typo “sorted” to “stored” in first edit. Comments still useful regarding this case tho!

The used amount of memory used on the server is defined by buffer pool size configuration setting. There's hardly any need to worry about what's happening on the server side. Your fetching application will probably be the bottleneck and thus be able to write the dump slower than MySQL can output. Server just takes care of filling the buffer while you fetch. Fetching one larger result set from server's standpoint is more efficient and less resource demanding than making multiple smaller range queries...

Typically in app level database calls, the entire result set is not returned, but rather a cursor into the result set is returned. It is then up to the app language (eg Python) to iterate that result set and retrieve the records.

The documentation for MySQL's Python connector confirms this:

By default, MySQL Connector/Python does not buffer or prefetch results. This means that after a query is executed, your program is responsible for fetching the data (emphasis mine). This avoids excessive memory use when queries return large result sets. If you know that the result set is small enough to handle all at once, you can fetch the results immediately by setting buffered to True. It is also possible to set this per cursor (see Section 10.2.6, “MySQLConnection.cursor() Method”).

Results generated by queries normally are not read until the client program fetches them. To automatically consume and discard result sets, set the consume_results option to True. The result is that all results are read, which for large result sets can be slow. (In this case, it might be preferable to close and reopen the connection.)

So, your strategy of using a SELECT * query, then writing to file one record at a time, or groups of records at a time, should work from a memory requirements point of view. Your Python code should only need as much memory to hold the current record(s) you are trying to write to file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM