简体   繁体   English

将Azure存储表数据清除/复制到SQL Azure的最快方法?

[英]The fastest way to purge/copy Azure Storage Table data to SQL Azure?

I have these worker roles that are aggregating incoming data and store the totals in the Azure Storage Tables. 我具有这些工作人员角色,这些角色正在聚集传入的数据并将总数存储在Azure存储表中。 I need this data to be purged/copied (on specified interval) to SQL Server for reporting. 我需要将此数据清除(或在指定的时间间隔内复制)到SQL Server进行报告。 I am taking about 1000s of rows to be purged in a batch. 我要批量清除大约数千行。 Simple loop with select/insert/update will take ages. 使用选择/插入/更新的简单循环将花费一些时间。

Any ideas how to do this most effectively? 任何想法如何最有效地做到这一点? Thanks! 谢谢!

Is all the data in well defined partitions? 所有数据都在定义明确的分区中吗? For instance, 1000 entities in a partition key "A" and 1000 in partition key "B". 例如,分区键“ A”中有1000个实体,分区键“ B”中有1000个实体。 If so, then you should be able to select all the records from a particular partition. 如果是这样,那么您应该能够从特定分区中选择所有记录。 Depending on the number of records, you may have to deal with continuation tokens (you can only get back a max number per request and use the continuation token to get the remaining records). 根据记录的数量,您可能必须处理延续令牌(您只能为每个请求取回最大数量,并使用延续令牌来获取剩余的记录)。

Using the partition key could also be a good way to update a batch (in a transaction). 使用分区键也可能是更新批处理(在事务中)的好方法。

You could try downloading all the data locally and then inserting them to SQL. 您可以尝试在本地下载所有数据,然后将其插入SQL。

How can I back up my Windows Azure table storage? 如何备份Windows Azure表存储?

I was looking for similar solution over a month ago and found that the fastest way was to use my own code - reading from table storage in batches and inserting to sql. 一个月前,我一直在寻找类似的解决方案,发现最快的方法是使用自己的代码-批量读取表存储并插入sql。 One thing that was helpful was to temporarily record PartitionKey + RowKey in sql import table, so that when my import failed I could safely restart it from the last successful position. 有用的一件事是在sql导入表中临时记录PartitionKey + RowKey,以便在导入失败时可以从上次成功的位置安全地重新启动它。

RedGate and others have some tools allowing you to retreive all data from table storage, but as far as I know - they dump it to files - not SQL. RedGate和其他一些工具允许您检索表存储中的所有数据,但据我所知-他们将其转储到文件中-而不是SQL。

To cover the easiest part first. 首先介绍最简单的部分。 Once you have the data from ATS in memory you can use SqlBulkCopy to insert alot of rows into SQL server very quickly (it works like BCP but from .NET). 一旦将ATS中的数据存储在内存中,就可以使用SqlBulkCopy将SQL Server中的许多行快速插入(它的工作方式类似于BCP,但来自.NET)。

Now, the hardest part is to get the data from ATS quickly. 现在,最困难的部分是从ATS快速获取数据。 I know nothing about your Pkey/Rkey schema. 我对您的Pkey / Rkey模式一无所知。 However, a few things to think about: 但是,需要考虑以下几点:

1) Executing queries against ATS with a single PartitionKey and range of RowKeys is quickest. 1)使用单个PartitionKey和RowKey的范围对ATS执行查询最快。 If your queries do not contain condition on RowKey, you maybe hit with continuation tokens even when you have less than 1000 rows and a PartitionKey specified 如果您的查询不包含RowKey的条件,那么即使您的行数少于1000并且指定了PartitionKey,您也可能会遇到连续令牌

2) If you need to grab a TON of data from ATS and can split the work into a bunch of individual and well performing queries, consider distributing your queries into queue messages and then have multiple processors process each query individually in paralelle 2)如果您需要从ATS中获取大量数据并将工作分解为一堆单独的且性能良好的查询,请考虑将查询分布到队列消息中,然后让多个处理器在paralelle中单独处理每个查询

3) Purging might be interesting. 3)清除可能很有趣。 You can purge 100 entities at a time using Batch transactions, so if your individual queries allow it, after you process the data into sql server, you can use the same in-memory entities and purge them 100 at a time per partition key (this will be moderately fast). 您可以使用批处理事务一次清除100个实体,因此,如果您的单个查询允许,则在将数据处理到sql server之后,您可以使用相同的内存中实体,并且每个分区键一次清除100个实体(此操作会很快)。 Alternatively, if you can, you can split your table into multiple tables partitioned by some date or other key and delete data by deleting a table at a time. 或者,如果可以的话,可以将表拆分为按某个日期或其他键分区的多个表,并通过一次删除一个表来删除数据。 For example, if you have a large Orders table that you need to move to SQL, instead of having a single Orders table, create monthly Orders tables: Orders201301..thru..Orders2xxx_12... and once you import a month of orders, simply kill that table via one command (works real quick). 例如,如果您有一个较大的Orders表,而您需要移至SQL,而不是只有一个Orders表,则创建每月的Orders表:Orders201301..thru..Orders2xxx_12 ...,然后导入一个月的订单,只需通过一个命令杀死该表(即可快速工作)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM