简体   繁体   中英

Improving performance of this Azure table row count query

I have the following query on an Azure table:

var count = table.ExecuteQuery(new TableQuery<MessageEntity>()).Count();

This is the MessageEntity :

public class MessageEntity : TableEntity
{
    public MessageEntity() { }

    public string Message { get; set; }
}

This query is really slow. It takes about 15 seconds to count a grand total of 85,000 records on my development machine. I am using the Azure Storage Emulator with UseDevelopmentStorage=true as my connection string. What can I do to make it more performant? All I am trying to do is count the number of records, surely there must be a faster way?

As you may already know that Azure Tables have limited LINQ support and Count is currently not supported ( List of supported LINQ operators ).

var count = table.ExecuteQuery(new TableQuery<MessageEntity>()).Count();

The code above is fetching all entities from the table (up to 1000 entities at a time) and because you have 20000 entities in your table, it is making at least 20 requests to Azure Tables. Since each request is an HTTP request to Azure Tables REST API, this is why you're seeing it is taking so much time. This will get even worse when you try to get the count for the entities in a table from a real storage account.

One way you could reduce time is by reducing the data that is transferred over the network (especially the response data). Currently every property of the entity is returned as part of response data. Since you're only interested in the total count, you can make use Query Projection and get only one property (say PartitionKey or RowKey ) back in response. That way your response data will be a lot smaller than what you currently have and that should reduce the time somewhat.

If knowing count is really important to you, another thing you could do is calculate the count through some background process and update it in a separate table.

Here is how you can get the count of entities in for a single partition in your azure table without retrieving the entities. For each partition, create one additional entity, lets call it row count entity, with the same partition key but a constant row key (ie. "rowcountRK"). Row count entity will have a single long type property ie. "rowCount" that will keep the number of rows for that partition.

Every time you insert an entity to that partition, you also increment the rowCount property of the row count entity for that partition. And you do that in a batch operation. Azure table batch operation is atomic within the same partition and so there will not be any inconsistency. Likewise, every time you delete an entity from a partition, you also decrement the rowCount property of your row count entity and again send these operations within a batch operation to azure table storage for consistency and atomicity.

Now, if you want to query the number of rows in a single partition, all you need to do is to query your row count entity for that partition and no need to retrieve/scan anything else. If you want to get the total number of rows in your entire table, assuming you have more than one partition in your table, then you would need to query all the row count entities in the table and sum their row count property values on the client side. This is going to cause a table scan but the payload would be smaller, it will probably be also faster than scanning the entire table. Alternatively if you know your Partition keys then you can do concurrent point queries to your row count entities for each partition and sum them up, which would very much likely be more efficient than a whole table scan.

There is no Count functionality in Azure Table Storage. What's actually happening in your query is you're pulling back all of the records and enumerating over them one by one.

There is an option available to you, but nothing I can suggest here will leverage built-in functionality. You could maintain a counter yourself, but you'd have to ensure you write atomically to both your main table on only writes and deletes and your counter. You can see that this can very easily go awry.

Table Storage is a key value pair store with a clustered key, composed of the Partition Key and Row Key. That's pretty much it. If you need aggregation capabilities, I'd suggest looking at DocumentDB (although they only have some aggregation functions implemented), or SQL Azure.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM