简体繁体中英

Retrieve 1+ million records from Azure Table Storage

原文 2019-12-03 20:30:33 6 3 c#/ .net/ azure/ azure-functions/ azure-table-storage

My table storage has approximately 1-2 million records and I have a daily job that needs needs to retrieve all the records that does not have a property A and do some further processing.

It is expected that there are about 1 - 1.5 million records without property A. I understand there are two approaches.

Query all records then filter results after
Do a table scan

Currently, it is using the approach where we query all records and filter in c#. However, the task is running in an Azure Function App. The query to retrieve all the results is sometimes taking over 10 minutes which is the limit for Azure Functions.

I'm trying to understand why retrieve 1 million records is taking so long and how to optimise the query. The existing design of the table is that the partition and row key are identical and is a guid - this leads me to believe that there is one entity per partition.

Looking at Microsoft docs, here are some key Table Storage limits ( https://docs.microsoft.com/en-us/azure/storage/common/storage-scalability-targets#azure-table-storage-scale-targets ):

Maximum request rate per storage account: 20,000 transactions per second, which assumes a 1-KiB entity size
Target throughput for a single table partition (1 KiB-entities): Up to 2,000 entities per second.

My initial guess is that I should use another partition key to group 2,000 entities per partition to achieve the target throughput of 2,000 per second per partition. Would this mean that 2,000,000 records could in theory be returned in 1 second?

Any thoughts or advice appreciated.

3 answers

I found this question after blogging on the very topic. I have a project where I am using the Azure Functions Consumption plan and have a big Azure Storage Table (3.5 million records).

Here's my blog post: https://www.joelverhagen.com/blog/2020/12/distributed-scan-of-azure-tables

I have mentioned a couple of options in this blog post but I think the fastest is distributing the "table scan" work into smaller work items that can be easily completed in the 10-minute limit. I have an implementation linked in the blog post if you want to try it out. It will likely take some adapting to your Azure Function but most of the clever part (finding the partition key ranges) is implemented and tested.

This looks to be essentially what user3603467 is suggesting in his answer .

I see two approaches to retrieve 1+ records in a batch process, where the result must be saved to a single media - like a file.

First) You identity/select all primary id/key of related data. Then you spawn parallel jobs with chunks of these primary id/keys where you read the actual data and process it. each job then report to the single media with the result.

Second) You identity/select (for update) top n of related data, and mark this data with a state of being processed. Use concurrency locking here, that should prevent others from picking that data up if this is done in parallel.

I will go for the first solution if possible, since it is the simplest and cleanest solution. The second solution is best if you use "select for update", i dont know if that is supported on Azure Table Storage.

You'll need to paralise the task. As you don't know the partition keys, run 24 separate queries PK that start and end for each letter of the alaphabet. Write a query where PK > A && PK < B, and > B < C etc. Then join the 24 results in memory. Super easy to do in a single function. In JS just use Promise.all([]).

Retrieve top n records from Azure Table Storage with .NET Core

Retrieve records from Azure Table Storage is throwing Object Reference not set

Get all records from azure table storage

Azure function retrieve row from Azure table storage

How to select records where field is missing from an Azure Storage Table?

How to retrieve data from azure table storage between date and time

How to query the most recent n records from Azure Table Storage?

Unit Testing a Retrieve TableOperation for Azure Table Storage

Retrieve pdf from azure blob-storage

How to retrieve a list of entities without specifying rowKey from Azure Table Storage in the TableOperation?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Retrieve top n records from Azure Table Storage with .NET Core Retrieve records from Azure Table Storage is throwing Object Reference not set Get all records from azure table storage Azure function retrieve row from Azure table storage How to select records where field is missing from an Azure Storage Table? How to retrieve data from azure table storage between date and time How to query the most recent n records from Azure Table Storage? Unit Testing a Retrieve TableOperation for Azure Table Storage Retrieve pdf from azure blob-storage How to retrieve a list of entities without specifying rowKey from Azure Table Storage in the TableOperation?

Related Tags

Retrieve 1+ million records from Azure Table Storage

Question

3 answers

solution1
1 ACCPTED 2020-12-31 19:24:10

solution2
0 2019-12-04 21:09:21

solution3
0 2019-12-14 04:03:32

Retrieve 1+ million records from Azure Table Storage

Question

3 answers

solution1 1 ACCPTED 2020-12-31 19:24:10

solution2 0 2019-12-04 21:09:21

solution3 0 2019-12-14 04:03:32

solution1
1 ACCPTED 2020-12-31 19:24:10

solution2
0 2019-12-04 21:09:21

solution3
0 2019-12-14 04:03:32