I need to store a defaultdict object containing ~20M objects into a database. The dictionary maps a string to a string, so the table has two columns, no primary key because it's constructed later.
Things I've tried:
Using transactions
con = sqlutils.getconnection() cur = con.cursor() print len(self.table) cur.execute("SET FOREIGN_KEY_CHECKS = 0;") cur.execute("SET UNIQUE_CHECKS = 0;") cur.execute("SET AUTOCOMMIT = 0;") i = 0 for k in self.table: cur.execute("INSERT INTO " + sqlutils.gettablename(self.sequence) + " (key, matches) values (%s, %s);", (k, str(self.hashtable[k]))) i += 1 if i % 10000 == 0: print i #cur.executemany("INSERT INTO " + sqlutils.gettablename(self.sequence) + " (key, matches) values (%s, %s)", [(k, str(self.table[k])) for k in self.table]) cur.execute("SET UNIQUE_CHECKS = 1;") cur.execute("SET FOREIGN_KEY_CHECKS = 1;") cur.execute("COMMIT") con.commit() cur.close() con.close() print "Finished", self.sequence, "in %.3f sec" % (time.time() - t)
This is a recent conversion from SQLite to MySQL. Oddly enough, I'm getting much better performance when I use SQLite (30s to insert 3M rows in SQLite, 480s in MySQL). Unfortunately, MySQL is a necessity because the project will be scaled up in the future.
-
Using LOAD DATA INFILE works like a charm. Thanks to all who helped! Inserting 3.2M rows takes me ~25s.
MySQL can inserts multiple values with one query: INSERT INTO table (key1, key2) VALUES ("value_key1", "value_key2"), ("another_value_key1", "another_value_key2"), ("and_again", "and_again...");
Also, you could try to write your datas inside a file and use LOAD DATA
from Mysql that is designed to insert with "very hight speed" (dixit Mysql).
I dunno if "file writing" + "MySQL Load Data" will be faster than Insert multiple values in one query (or many queries if MySQL has a limit for it)
It depends on your hardware (write a file is "fast" with a SSD), on your file system configuration, on MySQL configuration etc. So, you have to test on your "prod" env to see what solution is the fastest for you.
Insert of directly inserting, generate a sql file (using extended inserts etc) then fetch this to MySQL, this will save you quite a lot of overhead.
NB : you'll still save some execution time if you avoid recomputing constant values in your loop, ie:
for k in self.table:
xxx = sqlutils.gettablename(self.sequence)
do_something_with(xxx, k)
=>
xxx = sqlutils.gettablename(self.sequence)
for k in self.table:
do_something_with(xxx, k)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.