简体   繁体   中英

Best way to insert ~20 million rows using Python/MySQL

I need to store a defaultdict object containing ~20M objects into a database. The dictionary maps a string to a string, so the table has two columns, no primary key because it's constructed later.

Things I've tried:

  • executemany, passing in the set of keys and values in the dictionary. Works well when number of values < ~1M.
  • Executing single statements. Works, but slow.
  • Using transactions

     con = sqlutils.getconnection() cur = con.cursor() print len(self.table) cur.execute("SET FOREIGN_KEY_CHECKS = 0;") cur.execute("SET UNIQUE_CHECKS = 0;") cur.execute("SET AUTOCOMMIT = 0;") i = 0 for k in self.table: cur.execute("INSERT INTO " + sqlutils.gettablename(self.sequence) + " (key, matches) values (%s, %s);", (k, str(self.hashtable[k]))) i += 1 if i % 10000 == 0: print i #cur.executemany("INSERT INTO " + sqlutils.gettablename(self.sequence) + " (key, matches) values (%s, %s)", [(k, str(self.table[k])) for k in self.table]) cur.execute("SET UNIQUE_CHECKS = 1;") cur.execute("SET FOREIGN_KEY_CHECKS = 1;") cur.execute("COMMIT") con.commit() cur.close() con.close() print "Finished", self.sequence, "in %.3f sec" % (time.time() - t) 

This is a recent conversion from SQLite to MySQL. Oddly enough, I'm getting much better performance when I use SQLite (30s to insert 3M rows in SQLite, 480s in MySQL). Unfortunately, MySQL is a necessity because the project will be scaled up in the future.

-

Edit

Using LOAD DATA INFILE works like a charm. Thanks to all who helped! Inserting 3.2M rows takes me ~25s.

MySQL can inserts multiple values with one query: INSERT INTO table (key1, key2) VALUES ("value_key1", "value_key2"), ("another_value_key1", "another_value_key2"), ("and_again", "and_again...");

Also, you could try to write your datas inside a file and use LOAD DATA from Mysql that is designed to insert with "very hight speed" (dixit Mysql).

I dunno if "file writing" + "MySQL Load Data" will be faster than Insert multiple values in one query (or many queries if MySQL has a limit for it)

It depends on your hardware (write a file is "fast" with a SSD), on your file system configuration, on MySQL configuration etc. So, you have to test on your "prod" env to see what solution is the fastest for you.

Insert of directly inserting, generate a sql file (using extended inserts etc) then fetch this to MySQL, this will save you quite a lot of overhead.

NB : you'll still save some execution time if you avoid recomputing constant values in your loop, ie:

for k in self.table:
    xxx = sqlutils.gettablename(self.sequence)
    do_something_with(xxx, k)

=>

xxx = sqlutils.gettablename(self.sequence)
for k in self.table:
    do_something_with(xxx, k)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM