简体   繁体   中英

python-mysqldb : How to efficiently get millions/billions of records from database?

  • I have a table from which I have to fetch around 7 million records, and this will go upto billion records too(since data is added everyday)
  • I am using mysql-python to connect to remote MySQL database

  • I query like the following

cursor = conn.cursor()
cursor.execute(query)
return cursor

and try to print them as

sql = 'select * from reading table;' # has 7 million records
cursor = MySQLDB.execute(sql)
for row in cursor:
        print row
  • It is taking forever to print it

On server, I see the process is running

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                                                                                                                                                     
 3769 mysql     20   0 1120m 276m 5856 S  125  1.7   2218:09 mysqld      

Question What is the efficient way of querying a table with {m,b}illions of records using python?

Thank you

I would suggest two options:

  1. Direct the required data into a file with SELECT OUTFILE or even with a mysql console, and work with the file.

  2. You should understand that by default, mysql sends the whole resultset to the client, and the client mimicks as if the data is read row by row (though the whole result is already in memory, or failed if there is not enough memory). Alternatively, the resultset can be formed on the server-side. For that you will need to add cursor=MySQLdb.cursors.SSCursor parameter to MySQLdb.connect (See http://mysql-python.sourceforge.net/MySQLdb.html for details).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM