如何增加 blob_client.download_blob() 的 ResponseBodySize (Azure Blob Storage Python SDK)

Question

Reviewing some Azure Log Analytics logs and I see that each time my Python Azure Function downloads a blob from Azure Storage, there is an initial 32MB chunk, then all subsequent GetBlob actions are 4MB chunks.

如何增加此數字以減少 Function 的執行時間？

從存儲（Azure 函數）下載 blob 的示例 Python：

def load_blob_to_memory(blob_client):
    blob_data = blob_client.download_blob().readall()
    blob_bytes = io.BytesIO(blob_data)
    return blob_bytes

顯示 ResponseBodySize 的示例 Log Analytics：

詢問：

//==================================================//
// Assign variables
//==================================================//
let varStart = ago(2d);
let varEnd = now();
let varStorageAccount = 'stgtest';
let varIngressContainerName = 'cont-test';
let varFileName = 'test.csv';
let varSep = '/';
let varSampleUploadUri = strcat('https://', varStorageAccount, '.dfs.core.windows.net', varSep, varIngressContainerName, varSep, varFileName);
let varSampleDownloadUri = replace(@'%2F', @'/', replace(@'.dfs.', @'.blob.', tostring(varSampleUploadUri)));
//==================================================//
// Filter table
//==================================================//
StorageBlobLogs
| where TimeGenerated between (varStart .. varEnd)
  and AccountName == varStorageAccount
  //and StatusText == varStatus
  and split(Uri, '?')[0] == varSampleUploadUri
  or split(Uri, '?')[0] == varSampleDownloadUri
| summarize 
  count() by OperationName,
  TimeGenerated,
  UserAgent = tostring(split(UserAgentHeader, '(')[0]),
  FileName = tostring(split(tostring(parse_url(url_decode(Uri))['Path']), '/')[-1]),
  DownloadChunkSize = format_bytes(ResponseBodySize, 2, 'MB'),
  StatusCode,
  StatusText
| order by TimeGenerated asc

Output：

6/9/2021, 6:24:22.226 PM    GetBlob azsdk-python-storage-blob/12.8.1 Python/3.8.10  test.csv    32 MB   206 Success 1   
6/9/2021, 6:24:22.442 PM    GetBlob azsdk-python-storage-blob/12.8.1 Python/3.8.10  test.csv    4 MB    206 Success 1   
6/9/2021, 6:24:22.642 PM    GetBlob azsdk-python-storage-blob/12.8.1 Python/3.8.10  test.csv    4 MB    206 Success 1   
6/9/2021, 6:24:22.780 PM    GetBlob azsdk-python-storage-blob/12.8.1 Python/3.8.10  test.csv    4 MB    206 Success 1

BlobClient class 的download_blob()方法有一個max_concurrency 參數，但我不確定它是否需要完全異步/等待代碼重寫。

編輯1：謝謝@Guarav。 這將默認值增加到32MB 。

def create_blob_client(credentials):
    blob_client = BlobClient.from_blob_url(
                      event.get_json()["blobUrl"], 
                      credentials, 
                      max_single_get_size = 64*1024*1024, 
                      max_chunk_get_size = 32*1024*1024
    )
    return blob_client

Answer 1

請查看BlobClient構造函數的max_single_get_size和max_chunk_get_size arguments。 您可以調整這兩個以增加在單個請求中下載的數據量。

從文檔中：

max_single_get_size

單個調用中要下載的 blob 的最大大小，超出的部分將以塊的形式下載（可以是並行的）。 默認為 32 1024 1024 或 32MB。

和

max_chunk_get_size

用於下載 blob 的最大塊大小。 默認為 4 1024 1024 或 4MB。

如何增加 blob_client.download_blob() 的 ResponseBodySize (Azure Blob Storage Python SDK)

問題描述

1 個解決方案

解決方案1
1 2021-06-10 00:12:04

如何增加 blob_client.download_blob() 的 ResponseBodySize (Azure Blob Storage Python SDK)

問題描述

1 個解決方案

解決方案1 1 2021-06-10 00:12:04

解決方案1
1 2021-06-10 00:12:04