简体   繁体   中英

InnoDB Table Bulk Insert

I have a MySQL data table with about half-a-billion rows in it. We need to run calculations on this data by reading it, and the calculated data (which is a standardized form of the original data) needs to be written into another InnoDB table.

The setup we currently have is a virtual cloud with a machine in it as well as the database, therefore the machine-DB connection is very fast.

The calculations that occur on the data (as well as reading it) are very fast, and the bottleneck of this entire process is the insertion of the standardized data into the InnoDB tables (the standardized data contains a few indicies, though not long, which slows down the insertion).

Unfortunately, we cannot modify certain system variables like innodb_log_file_size (we are using Amazon AWS) which would help increase insert performance.

What would be our best best to push all this data onto MySQL? Since the calculation process is straightforward, I can pretty much write a Python script that takes the standardized data and outputs it in any format. Inserting this data on the fly as the calculations occur is painfully slow, and gets slower with time.

I guess the question would be then, what is the best process (in terms of input format, and actual import) for inserting bulk data into InnoDB tables?

In this case, as you are not doing anything on the base table - and most likely to update the data in the secondary innodb table only scheduled interval basis, I would perfer the below steps

  1. Take a mysqldump with --where (--where "id>91919" or --where "update_time > now() - interval 1 hour ") option. If possible avoid locking of table too
  2. Restore the data to a temp DB table
  3. Do your calculation on temp DB and update the secondary table
  4. Drop the temp DB/table created.

My first instinct was to ask you to tune your buffer variables.. but as you are saying that you cant change much of the server configuration parameters, here is another option...

Do the calculation and dump the output into a csv. You would use the 'SELECT ... INTO OUTFILE' command for this. Then you'd connect to the target InnoDB, and execute 'set autocommit=0' , followed by 'load data local infile ' to load this CSV back into the target table. Finally turn autocommit back to 1.

There are many other options I can suggest (like right partitioning schema, primary-key order inserts, etc), but I'd need to know the structure of your DB , incoming dataset and indexes for that.

Is yours time series data? Had a similar issue last week. Loaded partitions , it became faster. I also optimized my settings from http://www.ajaydivakaran.com/2013/03/12/mysql-innodb-when-inserts-start-slowing-down/ But if you cant optimize, then use partitioning for faster inserts.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM