简体   繁体   中英

The fastest way to purge/copy Azure Storage Table data to SQL Azure?

I have these worker roles that are aggregating incoming data and store the totals in the Azure Storage Tables. I need this data to be purged/copied (on specified interval) to SQL Server for reporting. I am taking about 1000s of rows to be purged in a batch. Simple loop with select/insert/update will take ages.

Any ideas how to do this most effectively? Thanks!

Is all the data in well defined partitions? For instance, 1000 entities in a partition key "A" and 1000 in partition key "B". If so, then you should be able to select all the records from a particular partition. Depending on the number of records, you may have to deal with continuation tokens (you can only get back a max number per request and use the continuation token to get the remaining records).

Using the partition key could also be a good way to update a batch (in a transaction).

You could try downloading all the data locally and then inserting them to SQL.

How can I back up my Windows Azure table storage?

I was looking for similar solution over a month ago and found that the fastest way was to use my own code - reading from table storage in batches and inserting to sql. One thing that was helpful was to temporarily record PartitionKey + RowKey in sql import table, so that when my import failed I could safely restart it from the last successful position.

RedGate and others have some tools allowing you to retreive all data from table storage, but as far as I know - they dump it to files - not SQL.

To cover the easiest part first. Once you have the data from ATS in memory you can use SqlBulkCopy to insert alot of rows into SQL server very quickly (it works like BCP but from .NET).

Now, the hardest part is to get the data from ATS quickly. I know nothing about your Pkey/Rkey schema. However, a few things to think about:

1) Executing queries against ATS with a single PartitionKey and range of RowKeys is quickest. If your queries do not contain condition on RowKey, you maybe hit with continuation tokens even when you have less than 1000 rows and a PartitionKey specified

2) If you need to grab a TON of data from ATS and can split the work into a bunch of individual and well performing queries, consider distributing your queries into queue messages and then have multiple processors process each query individually in paralelle

3) Purging might be interesting. You can purge 100 entities at a time using Batch transactions, so if your individual queries allow it, after you process the data into sql server, you can use the same in-memory entities and purge them 100 at a time per partition key (this will be moderately fast). Alternatively, if you can, you can split your table into multiple tables partitioned by some date or other key and delete data by deleting a table at a time. For example, if you have a large Orders table that you need to move to SQL, instead of having a single Orders table, create monthly Orders tables: Orders201301..thru..Orders2xxx_12... and once you import a month of orders, simply kill that table via one command (works real quick).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM