[英]Delete All Azure Table Records
我有一个 Azure 存储表,它有 3k+ 条记录。
删除表中所有行的最有效方法是什么?
对于 3000 条记录,最简单的方法是删除 table 。 但是请注意,当您删除该表时,它当时并没有被删除,而是被放入某种队列中等待删除,并在一段时间后被实际删除。 这个时间取决于系统负载 + 表中实体的数量。 在此期间,您将无法重新创建此表或使用此表。
如果继续使用该表对您很重要,那么唯一的其他选择是删除实体。 为了更快地删除,您可以查看使用Entity Batch Transactions
删除实体。 但是要删除实体,您需要先获取实体。 您可以通过仅提取实体的PartitionKey
和RowKey
属性而不是提取所有属性来加快提取过程,因为删除实体只需要这两个属性。
我使用这样的东西。 我们按日期分区键,您的情况可能有所不同:
async Task Main()
{
var startDate = new DateTime(2011, 1, 1);
var endDate = new DateTime(2012, 1, 1);
var account = CloudStorageAccount.Parse("connString");
var client = account.CreateCloudTableClient();
var table = client.GetTableReference("TableName");
var dates = Enumerable.Range(0, Math.Abs((startDate.Month - endDate.Month) + 12 * (startDate.Year - endDate.Year)))
.Select(offset => startDate.AddMonths(offset))
.ToList();
foreach (var date in dates)
{
var key = $"{date.ToShortDateString()}";
var query = $"(PartitionKey eq '{key}')";
var rangeQuery = new TableQuery<TableEntity>().Where(query);
var result = table.ExecuteQuery<TableEntity>(rangeQuery);
$"Deleting data from {date.ToShortDateString()}, key {key}, has {result.Count()} records.".Dump();
var allTasks = result.Select(async r =>
{
try
{
await table.ExecuteAsync(TableOperation.Delete(r));
}
catch (Exception e) { $"{r.RowKey} - {e.ToString()}".Dump(); }
});
await Task.WhenAll(allTasks);
}
}
对于后来发现这一点的人来说,接受的答案“刚刚删除了表”的问题在于,虽然它在存储模拟器中运行良好,但在生产中会随机失败。 如果您的应用程序/服务需要定期重新生成表,那么您会发现由于冲突或删除仍在进行中而导致失败。
相反,我发现删除分段查询中的所有行是最快和最容易出错的 EF 友好方法。 下面是我正在使用的一个简单的插入示例。 传入您的客户端、表名和实现 ITableEntity 的类型。
private async Task DeleteAllRows<T>(string table, CloudTableClient client) where T: ITableEntity, new()
{
// query all rows
CloudTable tableref = client.GetTableReference(table);
var query = new TableQuery<T>();
TableContinuationToken token = null;
do
{
var result = await tableref.ExecuteQuerySegmentedAsync(query, token);
foreach (var row in result)
{
var op = TableOperation.Delete(row);
tableref.ExecuteAsync(op);
}
token = result.ContinuationToken;
} while (token != null);
}
示例用法:
table = client.GetTableReference("TodayPerformanceSnapshot");
created = await table.CreateIfNotExistsAsync();
if(!created)
{
// not created, table already existed, delete all content
await DeleteAllRows<TodayPerformanceContainer>("TodayPerformanceSnapshot", client);
log.Information("Azure Table:{Table} Purged", table);
}
批处理方法需要付出更多的努力,因为您必须同时处理“批处理中只有相同的分区键”和“只有 100 行”的限制。 以下版本的 DeleteAllRows 执行此操作。
private async Task DeleteAllRows<T>(string table, CloudTableClient client) where T: ITableEntity, new()
{
// query all rows
CloudTable tableref = client.GetTableReference(table);
var query = new TableQuery<T>();
TableContinuationToken token = null;
TableBatchOperation batchops = new TableBatchOperation();
Dictionary<string, Stack<TableOperation>> pendingOperations = new Dictionary<string, Stack<TableOperation>>();
do
{
var result = await tableref.ExecuteQuerySegmentedAsync(query, token);
foreach (var row in result)
{
var op = TableOperation.Delete(row);
if (pendingOperations.ContainsKey(row.PartitionKey))
{
pendingOperations[row.PartitionKey].Push(op);
}
else
{
pendingOperations.Add(row.PartitionKey, new Stack<TableOperation>() );
pendingOperations[row.PartitionKey].Push(op);
}
}
token = result.ContinuationToken;
} while (token != null);
// order by partition key
foreach (var key in pendingOperations.Keys)
{
log.Information($"Deleting:{key}");
var rowStack = pendingOperations[key];
int max = 100;
int current = 0;
while (rowStack.Count != 0)
{
// dequeue in groups of 100
while (current < max && rowStack.Count > 0)
{
var op = rowStack.Pop();
batchops.Add(op);
current++;
}
//execute and reset
_ = await tableref.ExecuteBatchAsync(batchops);
log.Information($"Deleted batch of size:{batchops.Count}");
current = 0;
batchops.Clear();
}
}
}
这取决于您的数据结构,但如果您可以为所有记录TableBatchOperation
查询,则可以将每个记录添加到TableBatchOperation
并一次性执行所有记录。
这是一个仅获取同一分区键内的所有结果的示例,改编自如何开始使用 Azure 表存储和 Visual Studio 连接服务。
// query all rows
CloudTable peopleTable = tableClient.GetTableReference("myTableName");
var query = new TableQuery<MyTableEntity>();
var result = await remindersTable.ExecuteQuerySegmentedAsync(query, null);
// Create the batch operation.
TableBatchOperation batchDeleteOperation = new TableBatchOperation();
foreach (var row in result)
{
batchDeleteOperation.Delete(row);
}
// Execute the batch operation.
await remindersTable.ExecuteBatchAsync(batchDeleteOperation);
我使用以下函数首先将所有分区键放入队列中,然后循环遍历键以批量删除所有行 100。
Queue queue = new Queue();
queue.Enqueue("PartitionKeyTodelete1");
queue.Enqueue("PartitionKeyTodelete2");
queue.Enqueue("PartitionKeyTodelete3");
while (queue.Count > 0)
{
string partitionToDelete = (string)queue.Dequeue();
TableQuery<TableEntity> deleteQuery = new TableQuery<TableEntity>()
.Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionToDelete))
.Select(new string[] { "PartitionKey", "RowKey" });
TableContinuationToken continuationToken = null;
do
{
var tableQueryResult = await myTable.ExecuteQuerySegmentedAsync(deleteQuery, continuationToken);
continuationToken = tableQueryResult.ContinuationToken;
// Split into chunks of 100 for batching
List<List<TableEntity>> rowsChunked = tableQueryResult.Select((x, index) => new { Index = index, Value = x })
.Where(x => x.Value != null)
.GroupBy(x => x.Index / 100)
.Select(x => x.Select(v => v.Value).ToList())
.ToList();
// Delete each chunk of 100 in a batch
foreach (List<TableEntity> rows in rowsChunked)
{
TableBatchOperation tableBatchOperation = new TableBatchOperation();
rows.ForEach(x => tableBatchOperation.Add(TableOperation.Delete(x)));
await myTable.ExecuteBatchAsync(tableBatchOperation);
}
}
while (continuationToken != null);
}
我最近写了一个可以做到这一点的库。
来源/文档: https : //github.com/pflajszer/AzureTablesLifecycleManager
对于您的用例,代码如下所示:
// inject ITableManager in the constructor:
private readonly ITableManager _api;
public MyClass(ITableManager api)
{
_api = api;
}
/// <summary>
/// Delete all data from a single table
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="tableName"></param>
/// <returns></returns>
public Task<DataTransferResponse<T>> DeleteTableDataAsync<T>(string tableName) where T : class, ITableEntity, new()
{
// this query will return a single table with a given name:
Expression<Func<TableItem, bool>> tableQuery = x => x.Name == tableName;
// this query will return all the data from the table:
Expression<Func<T, bool>> dataQuery = x => true;
// ... but you can use LINQ to filter results too, like:
// Expression<Func<T, bool>> anotherExampleOfdataQuery = x => x.Timestamp < DateTime.Now.AddYears(-1);
return _api.DeleteDataFromTablesAsync<T>(tableQuery, dataQuery);
}
...或者,正如 Gaurav Mantri 建议的那样,您可以删除表本身:
/// <summary>
/// Delete a single table
/// </summary>
/// <param name="tableName"></param>
/// <returns></returns>
public Task<DataTransferResponse<TableItem>> DeleteTableAsync(string tableName)
{
// this query will return a single table with a given name:
Expression<Func<TableItem, bool>> tableQuery = x => x.Name == tableName;
return _api.DeleteTablesAsync(tableQuery);
}
这是我使用新的(呃)Azure.Data.Tables SDK 的解决方案,具有以下增强功能:
注意:我正在使用 System.Linq.Async nuget package 来使代码更具可读性。
/// <summary>
/// Deletes all rows from the table
/// </summary>
/// <param name="tableClient">The authenticated TableClient</param>
/// <returns></returns>
public static async Task DeleteAllEntitiesAsync(this TableClient tableClient)
{
// Only the PartitionKey & RowKey fields are required for deletion
AsyncPageable<TableEntity> entities = tableClient
.QueryAsync<TableEntity>(select: new List<string>() { "PartitionKey", "RowKey" }, maxPerPage: 1000);
await entities.AsPages().ForEachAwaitAsync(async page => {
// Since we don't know how many rows the table has and the results are ordered by PartitonKey+RowKey
// we'll delete each page immediately and not cache the whole table in memory
await BatchManipulateEntities(tableClient, page.Values, TableTransactionActionType.Delete).ConfigureAwait(false);
});
}
/// <summary>
/// Groups entities by PartitionKey into batches of max 100 for valid transactions
/// </summary>
/// <returns>List of Azure Responses for Transactions</returns>
public static async Task<List<Response<IReadOnlyList<Response>>>> BatchManipulateEntities<T>(TableClient tableClient, IEnumerable<T> entities, TableTransactionActionType tableTransactionActionType) where T : class, ITableEntity, new()
{
var groups = entities.GroupBy(x => x.PartitionKey);
var responses = new List<Response<IReadOnlyList<Response>>>();
foreach (var group in groups)
{
List<TableTransactionAction> actions;
var items = group.AsEnumerable();
while (items.Any())
{
var batch = items.Take(100);
items = items.Skip(100);
actions = new List<TableTransactionAction>();
actions.AddRange(batch.Select(e => new TableTransactionAction(tableTransactionActionType, e)));
var response = await tableClient.SubmitTransactionAsync(actions).ConfigureAwait(false);
responses.Add(response);
}
}
return responses;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.