简体   繁体   English

删除所有Azure表记录

[英]Delete All Azure Table Records

I have an Azure Storage Table and it has 3k+ records.我有一个 Azure 存储表,它有 3k+ 条记录。

What is the most efficient way to delete all the rows in the table?删除表中所有行的最有效方法是什么?

For 3000 records, easiest way would be to delete the table .对于 3000 条记录,最简单的方法是删除 table However please note that when you delete the table, it is not deleted at that time but is put in some kind of queue to be deleted and is actually deleted some time later.但是请注意,当您删除该表时,它当时并没有被删除,而是被放入某种队列中等待删除,并在一段时间后被实际删除。 This time depends on the load on the system + number of entities in the table.这个时间取决于系统负载 + 表中实体的数量。 During this time, you will not be able to recreate this table or use this table.在此期间,您将无法重新创建此表或使用此表。

If it is important for you to keep using the table, the only other option is to delete entities.如果继续使用该表对您很重要,那么唯一的其他选择是删除实体。 For faster deletes, you can look at deleting entities using Entity Batch Transactions .为了更快地删除,您可以查看使用Entity Batch Transactions删除实体。 But for deleting entities, you would need to first fetch the entities.但是要删除实体,您需要先获取实体。 You can speed up the fetching process by only fetching PartitionKey and RowKey attributes of the entities instead of fetching all attributes as only these two attributes are required for deleting an entity.您可以通过仅提取实体的PartitionKeyRowKey属性而不是提取所有属性来加快提取过程,因为删除实体只需要这两个属性。

I use something like this.我使用这样的东西。 We partition key by date, your case may be different:我们按日期分区键,您的情况可能有所不同:

async Task Main()
{
    var startDate = new DateTime(2011, 1, 1);
    var endDate = new DateTime(2012, 1, 1);

    var account = CloudStorageAccount.Parse("connString");
    var client = account.CreateCloudTableClient();
    var table = client.GetTableReference("TableName");

    var dates = Enumerable.Range(0, Math.Abs((startDate.Month - endDate.Month) + 12 * (startDate.Year - endDate.Year)))
        .Select(offset => startDate.AddMonths(offset))
        .ToList();

    foreach (var date in dates)
    {
        var key = $"{date.ToShortDateString()}";

        var query = $"(PartitionKey eq '{key}')";
        var rangeQuery = new TableQuery<TableEntity>().Where(query);

        var result = table.ExecuteQuery<TableEntity>(rangeQuery);
        $"Deleting data from {date.ToShortDateString()}, key {key}, has {result.Count()} records.".Dump();

        var allTasks = result.Select(async r =>
        {
            try
            {
                await table.ExecuteAsync(TableOperation.Delete(r));
            }
            catch (Exception e) { $"{r.RowKey} - {e.ToString()}".Dump(); }
        });
        await Task.WhenAll(allTasks);
    }
}

For someone finding this later, the problem with the accepted answer "just deleted the table" is that while it works great in the storage emulator, it will fail randomly in production.对于后来发现这一点的人来说,接受的答案“刚刚删除了表”的问题在于,虽然它在存储模拟器中运行良好,但在生产中会随机失败。 If your app/service requires regenerating tables regularly then you'll find that you'll have failures due to either conflicts or deletion still in progress.如果您的应用程序/服务需要定期重新生成表,那么您会发现由于冲突或删除仍在进行中而导致失败。

Instead, I found the fastest and most error proof EF friendly approach to be deleting all rows within a segmented query.相反,我发现删除分段查询中的所有行是最快和最容易出错的 EF 友好方法。 Below is a simple drop-in example that I'm using.下面是我正在使用的一个简单的插入示例。 Pass in your client, table name, and a type that implements ITableEntity.传入您的客户端、表名和实现 ITableEntity 的类型。

private async Task DeleteAllRows<T>(string table, CloudTableClient client) where T: ITableEntity, new()
    {
        // query all rows
        CloudTable tableref = client.GetTableReference(table);           
        var query = new TableQuery<T>();
        TableContinuationToken token = null;
                                         
        do
        {
            var result = await tableref.ExecuteQuerySegmentedAsync(query, token);  
            foreach (var row in result)
            {
                var op = TableOperation.Delete(row);
                tableref.ExecuteAsync(op);
            }
            token = result.ContinuationToken;
        } while (token != null);  
        
    }

Example Usage:示例用法:

table = client.GetTableReference("TodayPerformanceSnapshot");
created = await table.CreateIfNotExistsAsync();

if(!created)
{ 
    // not created, table already existed, delete all content
   await DeleteAllRows<TodayPerformanceContainer>("TodayPerformanceSnapshot", client);
   log.Information("Azure Table:{Table} Purged", table);
}

A batched approach takes significantly more effort since you have to handle both the "only same partition keys in a batch" and "only 100 rows" limitations.批处理方法需要付出更多的努力,因为您必须同时处理“批处理中只有相同的分区键”和“只有 100 行”的限制。 The following version of DeleteAllRows does this.以下版本的 DeleteAllRows 执行此操作。

private async Task DeleteAllRows<T>(string table, CloudTableClient client) where T: ITableEntity, new()
    {
        // query all rows
        CloudTable tableref = client.GetTableReference(table);           
        var query = new TableQuery<T>();
        TableContinuationToken token = null;            
        TableBatchOperation batchops = new TableBatchOperation();
        Dictionary<string, Stack<TableOperation>> pendingOperations = new Dictionary<string, Stack<TableOperation>>();
        
        do
        {
            var result = await tableref.ExecuteQuerySegmentedAsync(query, token);
            foreach (var row in result)
            {
               var op = TableOperation.Delete(row);
                if (pendingOperations.ContainsKey(row.PartitionKey))
                {
                    pendingOperations[row.PartitionKey].Push(op);
                }
                else
                {
                    pendingOperations.Add(row.PartitionKey, new Stack<TableOperation>() );
                    pendingOperations[row.PartitionKey].Push(op);
                }                                    
            }
            token = result.ContinuationToken;
        } while (token != null);

        // order by partition key            
        foreach (var key in pendingOperations.Keys)
        {                
            log.Information($"Deleting:{key}");                
            var rowStack = pendingOperations[key];
            int max = 100;
            int current = 0;

            while (rowStack.Count != 0)
            {
                // dequeue in groups of 100
                while (current < max && rowStack.Count > 0)
                {
                    var op = rowStack.Pop();
                    batchops.Add(op);
                    current++;
                }

                //execute and reset
                _ = await tableref.ExecuteBatchAsync(batchops);
                log.Information($"Deleted batch of size:{batchops.Count}");
                current = 0;
                batchops.Clear();
            }
        }                       
    }

This depends on the structure of your data, but if you can compose a query for all records, you can add each to a TableBatchOperation and execute them all at once.这取决于您的数据结构,但如果您可以为所有记录TableBatchOperation查询,则可以将每个记录添加到TableBatchOperation并一次性执行所有记录。

Here's an example that just gets all the results inside the same partition key, adapted from How to get started with Azure Table storage and Visual Studio connected services .这是一个仅获取同一分区键内的所有结果的示例,改编自如何开始使用 Azure 表存储和 Visual Studio 连接服务

// query all rows
CloudTable peopleTable = tableClient.GetTableReference("myTableName");
var query = new TableQuery<MyTableEntity>();
var result = await remindersTable.ExecuteQuerySegmentedAsync(query, null);

// Create the batch operation.
TableBatchOperation batchDeleteOperation = new TableBatchOperation();

foreach (var row in result)
{
    batchDeleteOperation.Delete(row);
}

// Execute the batch operation.
await remindersTable.ExecuteBatchAsync(batchDeleteOperation);

I use the following function to first put all partitions keys in a queue and then loop through the key to delete all rows in batches of 100.我使用以下函数首先将所有分区键放入队列中,然后循环遍历键以批量删除所有行 100。

Queue queue = new Queue();
            queue.Enqueue("PartitionKeyTodelete1");
            queue.Enqueue("PartitionKeyTodelete2");
            queue.Enqueue("PartitionKeyTodelete3");

            while (queue.Count > 0)
            {
                string partitionToDelete = (string)queue.Dequeue();

                TableQuery<TableEntity> deleteQuery = new TableQuery<TableEntity>()
                  .Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionToDelete))
                  .Select(new string[] { "PartitionKey", "RowKey" });

                TableContinuationToken continuationToken = null;

                do
                {
                    var tableQueryResult = await myTable.ExecuteQuerySegmentedAsync(deleteQuery, continuationToken);

                    continuationToken = tableQueryResult.ContinuationToken;

                    // Split into chunks of 100 for batching
                    List<List<TableEntity>> rowsChunked = tableQueryResult.Select((x, index) => new { Index = index, Value = x })
                        .Where(x => x.Value != null)
                        .GroupBy(x => x.Index / 100)
                        .Select(x => x.Select(v => v.Value).ToList())
                        .ToList();

                    // Delete each chunk of 100 in a batch
                    foreach (List<TableEntity> rows in rowsChunked)
                    {
                        TableBatchOperation tableBatchOperation = new TableBatchOperation();
                        rows.ForEach(x => tableBatchOperation.Add(TableOperation.Delete(x)));

                        await myTable.ExecuteBatchAsync(tableBatchOperation);
                    }
                }
                while (continuationToken != null);
            }

I recently wrote a library that can do exactly that.我最近写了一个可以做到这一点的库。

Source/docs: https://github.com/pflajszer/AzureTablesLifecycleManager来源/文档: https : //github.com/pflajszer/AzureTablesLifecycleManager

for your use case, the code would look something like this:对于您的用例,代码如下所示:

// inject ITableManager in the constructor:

private readonly ITableManager _api;

public MyClass(ITableManager api)
{
    _api = api;
}
/// <summary>
/// Delete all data from a single table
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="tableName"></param>
/// <returns></returns>
public Task<DataTransferResponse<T>> DeleteTableDataAsync<T>(string tableName) where T : class, ITableEntity, new()
{
    // this query will return a single table with a given name:
    Expression<Func<TableItem, bool>> tableQuery = x => x.Name == tableName;

    // this query will return all the data from the table:
    Expression<Func<T, bool>> dataQuery = x => true;
             
    // ... but you can use LINQ to filter results too, like:
    // Expression<Func<T, bool>> anotherExampleOfdataQuery = x => x.Timestamp < DateTime.Now.AddYears(-1);

    return _api.DeleteDataFromTablesAsync<T>(tableQuery, dataQuery);
}

... or, as Gaurav Mantri suggested, you can just delete the table itself: ...或者,正如 Gaurav Mantri 建议的那样,您可以删除表本身:

/// <summary>
/// Delete a single table
/// </summary>
/// <param name="tableName"></param>
/// <returns></returns>
public Task<DataTransferResponse<TableItem>> DeleteTableAsync(string tableName)
{
    // this query will return a single table with a given name:
    Expression<Func<TableItem, bool>> tableQuery = x => x.Name == tableName;

    return _api.DeleteTablesAsync(tableQuery);
}

Here's my solution using the new(er) Azure.Data.Tables SDK with the following enhancements:这是我使用新的(呃)Azure.Data.Tables SDK 的解决方案,具有以下增强功能:

  • Getting 1000 rows per page每页获取 1000 行
  • Getting only PartitonKey & RowKey for each row仅获取每行的 PartitonKey 和 RowKey
  • Grouping rows to delete into batches of 100s max by PartitionKey按 PartitionKey 将要删除的行分组为最多 100 秒的批次
  • Written as extension methods to the TableClient so it's easily reusable作为 TableClient 的扩展方法编写,因此很容易重用

Note: I'm using the System.Linq.Async nuget package to make the code a bit more readable.注意:我正在使用 System.Linq.Async nuget package 来使代码更具可读性。

/// <summary>
/// Deletes all rows from the table
/// </summary>
/// <param name="tableClient">The authenticated TableClient</param>
/// <returns></returns>
public static async Task DeleteAllEntitiesAsync(this TableClient tableClient)
{
    // Only the PartitionKey & RowKey fields are required for deletion
    AsyncPageable<TableEntity> entities = tableClient
        .QueryAsync<TableEntity>(select: new List<string>() { "PartitionKey", "RowKey" }, maxPerPage: 1000);

    await entities.AsPages().ForEachAwaitAsync(async page => {
        // Since we don't know how many rows the table has and the results are ordered by PartitonKey+RowKey
        // we'll delete each page immediately and not cache the whole table in memory
        await BatchManipulateEntities(tableClient, page.Values, TableTransactionActionType.Delete).ConfigureAwait(false);
    });
}

/// <summary>
/// Groups entities by PartitionKey into batches of max 100 for valid transactions
/// </summary>
/// <returns>List of Azure Responses for Transactions</returns>
public static async Task<List<Response<IReadOnlyList<Response>>>> BatchManipulateEntities<T>(TableClient tableClient, IEnumerable<T> entities, TableTransactionActionType tableTransactionActionType) where T : class, ITableEntity, new()
{
    var groups = entities.GroupBy(x => x.PartitionKey);
    var responses = new List<Response<IReadOnlyList<Response>>>();
    foreach (var group in groups)
    {
        List<TableTransactionAction> actions;
        var items = group.AsEnumerable();
        while (items.Any())
        {
            var batch = items.Take(100);
            items = items.Skip(100);

            actions = new List<TableTransactionAction>();
            actions.AddRange(batch.Select(e => new TableTransactionAction(tableTransactionActionType, e)));
            var response = await tableClient.SubmitTransactionAsync(actions).ConfigureAwait(false);
            responses.Add(response);
        }
    }
    return responses;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用连接删除基于另一个表的记录 - delete records based on another table using join 如何使用 azure 数据工厂从 sql 数据库中删除记录 - How to delete records from a sql database using azure data factory 如何从 dynamodb 中批量删除所有没有 TTL 的记录? - How bulk delete all records without TTL from dynamodb? 如何删除 Dynamodb 表中的所有项目 - How to delete all items the table in a Dynamodb 查找 SQL 表中缺少特定记录的所有具有相同 Id 的记录 - Find All Records with the same Id in SQL Table that are missing a specific record PowerShell REST 从 Azure 存储帐户表中删除 - PowerShell REST DELETE from Azure Storage Account Table 从具有所有列匹配值的 Bigquery 表中删除数据 - delete data from Bigquery table that has all column matching value 获取所有记录,其中字符串中的每个单词都存在于表中的任何列上 - Get all records where each words in string exists on any of the columns in a table 使用查询(python boto3)从 Dynamo DB 表中获取所有记录 - Fetch all the records from Dynamo DB table using Query (python boto3) 如何使用 azure 数据工厂删除活动删除子文件夹? - How to delete sub folder using azure data factory delete activity?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM