Azure blob 容器 - 提高从 Azure blob 容器中获取 pdf 文件和文件详细信息的性能

Question

下面是我从 Azure blob 容器中获取 pdf 文件的代码。 大多数情况下，blob 容器最多包含 4000 个 pdf 文件。

我面临的问题：当 blob 容器中有超过 500 个 pdf 文件时，性能非常慢并且需要更多时间来获取。

需要输入并帮助纠正我的代码以提高性能。 有什么方法可以改进下面的代码，以在 5 秒内获取 blob 容器中可用的所有文件和文件详细信息。

SearchQuery.cs file:


    List<WorkflowDocumentListModel> documentListModel = new List<WorkflowDocumentListModel>();
    if (request.RequestModel.WorkflowStatusId == (byte)WorkflowStatus.RCV) {
    IList<PdfFileMetaDataResponse> RCVFileList = await _blobManager.GetAllFilesMetaDataAsync(UserContext.AzureBlobContainerWorkflowQueue, cancellationToken).ConfigureAwait(false);
    foreach (PdfFileMetaDataResponse item in RCVFileList ) {
    documentListModel.Add(new WorkflowDocumentListModel {
      Id = 0,
      Name = item.FileName,
      AssignedTo = item.CustomUserName,
      AssignedToId = item.CustomUserId,
      AppDate = item.LastModified?.ClientTimeToUtc(UserContext.TimeZone),
      IsGrabbedByUser = userId == item.CustomUserId
    });
    }
    if (!string.IsNullOrEmpty(request.RequestModel.FullTextFilter?.Trim())) {
                        documentListModel = documentListModel.Where(x => x.Name.Contains(request.RequestModel.FullTextFilter))
                        .OrderBy(i => i.AppDate?.Date)
                        .ThenBy(i => i.AppDate?.TimeOfDay).ToList();                      
                    } else {
                       documentListModel = documentListModel
                      .OrderBy(i => i.AppDate?.Date)
                      .ThenBy(i => i.AppDate?.TimeOfDay).ToList();                    
                    }                  

                    return new WorkflowDocumentSearchRepsonse { Data = documentListModel, RecordCount = documentListModel.Count };
}

    
BlobManager.cs file method:
    
    
    public async Task<IList<PdfFileMetaDataResponse>> GetAllFilesMetaDataAsync(string containerName, CancellationToken cancellationToken) {
    BlobServiceClient _blobServiceClient = new BlobServiceClient(UserContext.AzureBlobStorageConnection);
    BlobContainerClient containerClient = _blobServiceClient.GetBlobContainerClient(containerName);
    List<PdfFileMetaDataResponse> responseList = new List<PdfFileMetaDataResponse>();
    await foreach (BlobItem file in containerClient.GetBlobsAsync(cancellationToken: cancellationToken))
    {
    BlobClient blobClient = containerClient.GetBlobClient(file.Name);
    BlobProperties blobProperties = await blobClient.GetPropertiesAsync(cancellationToken: cancellationToken).ConfigureAwait(false);    
    DateTime? customLastModifiedDate = ConvertWebApiFileLastModifiedMillisecondsToDateTime(
    blobProperties.Metadata?.Where(i => i.Key.ToUpperInvariant() == CustomBlobMetadataLastModifiedMilliseconds.ToUpperInvariant())                                                     .Select(i => i.Value).FirstOrDefault());                                                                           
    string customUserName = blobProperties.Metadata?.Where(i => i.Key.ToUpperInvariant() == CustomBlobMetadataUserName.ToUpperInvariant())                                                                                .Select(i => i.Value).FirstOrDefault();                                                                                
    bool hasCustomUserID = int.TryParse(blobProperties.Metadata?.Where(i => i.Key.ToUpperInvariant() == CustomBlobMetadataUserId.ToUpperInvariant())                                                             .Select(i => i.Value).FirstOrDefault(), out int customUserID);    
    PdfFileMetaDataResponse response = new PdfFileMetaDataResponse(){
    FileName = file.Name,
    CustomUserName = customUserName,
    CustomUserId = hasCustomUserID ? customUserID : (int?)null,
    LastModified = customLastModifiedDate ?? file.Properties.LastModified?.UtcDateTime
    };    
    responseList.Add(response);
    }    
    return responseList;
    }

谢谢你。

Answer 1

当您调用ContainerClient.GetBlobsAsync时，会自动获取 blob 属性。 因此，您无需为每个 blob 调用GetPropertiesAsync 。 那应该会提高速度。

我还注意到您的代码会检查 blob 的元数据（这可能就是您再次获取每个 blob 的属性的原因）。 但是，您可以在 blob 列表操作中获取 blob 的元数据。 您只需要在列出 blob 时包含BlobTraits 。

因此，您的列表 blob 代码将类似于：

await foreach (BlobItem file in containerClient.GetBlobsAsync(traits: BlobTraits.Metadata, cancellationToken: cancellationToken))

Azure blob 容器 - 提高从 Azure blob 容器中获取 pdf 文件和文件详细信息的性能

问题描述

1 个解决方案

解决方案1
0 2022-08-14 05:17:51

Azure blob 容器 - 提高从 Azure blob 容器中获取 pdf 文件和文件详细信息的性能

问题描述

1 个解决方案

解决方案1 0 2022-08-14 05:17:51

解决方案1
0 2022-08-14 05:17:51