简体   繁体   中英

Improving Azure Table Storage Upserts

I have a job that processes about 80K items and has to insert/update them into Azure Table Storage.

I am not getting the table storage's specs of 20K/second per storage and 2k/sec per table.

The fastest I can get this to process is about ~350/seconds. This is true of very small (194K items and much bigger ones).

I am using:

.NET 6
Azure Function v4
Azure.Data.Table nuget package (v 12)
v1 storage account
Each item has a unique partition
ServicePointManager.UseNagleAlgorithm = false;
ServicePointManager.Expect100Continue = false;
ServicePointManager.DefaultConnectionLimit = 200; (I've adjusted this to minor differences)

I have found that running locally in release, the fastest code is:

    await Parallel.ForEachAsync(array, async (item, ct) =>
    {
        await storageTable.UpsertEntityAsync(item, TableUpdateMode.Replace, ct);
    });

I have tried the following:

non-async versions of every
for i and with an await
for i and added the task to a task array then await the task list
foreach with an await
foreach and added the task to a task array
Parallel foreach

var partition = Partitioner.Create(0, list.Count, 50);
Parallel.ForEach(partition, options, item => {});

Upserts vs Inserts (the same)

I don't get real benefits from the task list and awaiting it because the library has an internal await (versus returning a task). Running it as in my example yields similar times as to adding a task list and awaiting it.

Am I missing something that could provide better performance for inserts? Would writing up direct http calls (and skip the library) give me better [a lot] results?

Edit - added partition type of attempt

1 thing that enables greater throughput per process, is batch transactions - but those aren't applicable in your case as you have unique partition keys.

Which means, it comes down to parallelisation. It's certainly possible to get much higher - you mentioned 2K/sec per table, but it's actually 2K/sec per partition as the throughput limit.

I did a pretty comprehensive blog post on this exact topic not long ago - using Azure Function consumption plan to scale-out and perform inserts in parallel (unique partitions). I managed to hit a peak throughput of around 17K upserts/sec. There's full code sample, stats, notes on monitoring and some gotchas all in there:

https://www.adathedev.co.uk/2022/02/bulk-load-azure-table-storage-functions.html

During that research, I looked at the UseNagleAlgorithm tweaks etc like you have - but in the end didn't tweak any of those. The thing that made the big difference was the overall approach I ended up taking to bulk load in parallel.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM