简体   繁体   中英

Migrating data from a ~90 million record MySQL table to another database

In the past week I've been trying to migrate a database containing approximately 90 million rows from MySQL to a newly created Couchbase instance. I've researched the web for possible solutions for doing so and found some tools which ultimately failed due to low memory availability. I also read about partitioning but I'm no expert in MySQL administration so this seemed like an over-reach for my abilities at the moment. Eventually I decided to implement my own designated script which would select a certain amount of data from the existing MySQL table, serialize it for Couchbase's newly created bucket and insert it there. The tool works great for the first 5 million records, but then the instance of MySQL takes way too long to retrieve further records.

It is worth mentioning that the MySQL table I'm working is only being used by me, thus no changes are being made during the migration process.

The script I built leverages the LIMIT OFFSET statement as stated in the Select Syntax Documentation and looks like this:

SELECT * FROM data LIMIT ?,?

Where ?,? is generated by increasing the starting point of the selection by a certain amount of records. For example, the following are possible queries done by a single migration process:

SELECT * FROM data LIMIT 0,100000
SELECT * FROM data LIMIT 100000,200000
SELECT * FROM data LIMIT 200000,300000
...

The migration process will stop when no records are retrieved. As I previously stated, the queries which select records starting from position of about 5 million are taking too long and make the migration process undoable. I'm no database expert and have done nothing other than creating a new MySQL database and tables via MySQL Workbench 6.3 CE and no optimizations have been made on my data. The table I'm trying to migrate contains one column which acts as a key, non-null, and has a unique value. All other columns have no options enabled on them.

I would like to know if there is any other way for me to select the data sequentially so it could be inserted without duplicates or corruption. Any help on this matter is greatly appreciated!

You are wrongly doing the pagination. See Using MySQL LIMIT to Constrain The Number of Rows Returned By SELECT Statement

The following illustrates the LIMIT clause syntax with two arguments:

SELECT 
    column1,column2,...
FROM
    table
LIMIT offset , count;
  • The offset specifies the offset of the first row to return. The offset of the first row is 0, not 1.
  • The count specifies the maximum number of rows to return.

So you should have a fixed page size (count) and a variable offset with no overlaping.

SELECT * FROM data LIMIT 0,100000
SELECT * FROM data LIMIT 100000,100000
SELECT * FROM data LIMIT 200000,100000
....
SELECT * FROM data LIMIT 89900000,100000

I guess MySQL starts taking an unusably long time to satisfy your LIMIT clauses when their numbers get larger. LIMIT does that.

You'll have much better luck using an indexed colum to select each segment of your table to export. There's no harm done if some segments contain fewer rows than others.

For example you could do

SELECT * FROM data WHERE datestamp >= '2017-01-01' AND datestamp < '2017-02-01';
SELECT * FROM data WHERE datestamp >= '2017-02-01' AND datestamp < '2017-03-01';
SELECT * FROM data WHERE datestamp >= '2017-03-01' AND datestamp < '2017-04-01';
SELECT * FROM data WHERE datestamp >= '2017-04-01' AND datestamp < '2017-05-01';
SELECT * FROM data WHERE datestamp >= '2017-05-01' AND datestamp < '2017-06-01';
SELECT * FROM data WHERE datestamp >= '2017-06-01' AND datestamp < '2017-07-01';
 ...

to break out your records by calendar month (assuming you have a datestamp column).

Or, if you have an autoincrementing primary key id column try this

SELECT * FROM data WHERE                 id < 100000;
SELECT * FROM data WHERE id>= 100000 AND id < 200000;
SELECT * FROM data WHERE id>= 200000 AND id < 300000;
SELECT * FROM data WHERE id>= 300000 AND id < 400000;
SELECT * FROM data WHERE id>= 400000 AND id < 500000;
SELECT * FROM data WHERE id>= 500000 AND id < 600000;
 ...

An entirely diffent approach that will still work. In your dumping program do

 SELECT * FROM data;

then have the program switch to another output file every n records. For example, pseudo code

 rowcount = 100000
 rownum = 0
 rowsleft = rowcount
 open file 'out' + 000000;
 while next input record available {
     read record
     write record
     rownum = rownum + 1
     rowsleft = rowsleft - 1
     if rowsleft <= 1 {
        close file
        open file 'out' + rownum
        rowsleft = rowcount
     }
  }
  close file

This will use a single MySQL query, so you won't have to worry about segments. It should be quite fast.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM