簡體   English   中英

下載 Azure 存儲容器中的所有 blob

[英]Download all blobs within an Azure Storage container

我已經設法編寫了一個 python 腳本來列出容器中的所有 blob。

import azure
from azure.storage.blob import BlobService
from azure.storage import *

blob_service = BlobService(account_name='<CONTAINER>', account_key='<ACCOUNT_KEY>')


blobs = []
marker = None
while True:
    batch = blob_service.list_blobs('<CONAINER>', marker=marker)
    blobs.extend(batch)
    if not batch.next_marker:
        break
    marker = batch.next_marker
for blob in blobs:
    print(blob.name)

就像我說的,這只列出了我要下載的 blob。 我已經轉到 Azure CLI,看看它是否可以幫助我完成我想做的事情。 我可以下載一個 blob

azure storage blob download [container]

然后它提示我指定一個 blob,我可以從 python 腳本中獲取它。 我必須下載所有這些 blob 的方法是在上面使用的命令之后將它們復制並粘貼到提示符中。 有沒有辦法我可以:

一個 編寫一個 bash 腳本,通過執行命令來遍歷 blob 列表,然后在提示中粘貼下一個 blob 名稱。

指定在 python 腳本或 Azure CLI 中下載容器。 下載整個容器時有什么我看不到的嗎?

@gary-liu-msft 解決方案是正確的。 我對其進行了更多更改,現在代碼可以遍歷容器及其中的文件夾結構(PS - 容器中沒有文件夾,只有路徑),檢查客戶端中是否存在相同的目錄結構,如果不存在創建該目錄結構並下載這些路徑中的 blob。 它支持帶有嵌入式子目錄的長路徑。

from azure.storage.blob import BlockBlobService
from azure.storage.blob import PublicAccess
import os

#name of your storage account and the access key from Settings->AccessKeys->key1
block_blob_service = BlockBlobService(account_name='storageaccountname', account_key='accountkey')

#name of the container
generator = block_blob_service.list_blobs('testcontainer')

#code below lists all the blobs in the container and downloads them one after another
for blob in generator:
    print(blob.name)
    print("{}".format(blob.name))
    #check if the path contains a folder structure, create the folder structure
    if "/" in "{}".format(blob.name):
        print("there is a path in this")
        #extract the folder path and check if that folder exists locally, and if not create it
        head, tail = os.path.split("{}".format(blob.name))
        print(head)
        print(tail)
        if (os.path.isdir(os.getcwd()+ "/" + head)):
            #download the files to this directory
            print("directory and sub directories exist")
            block_blob_service.get_blob_to_path('testcontainer',blob.name,os.getcwd()+ "/" + head + "/" + tail)
        else:
            #create the diretcory and download the file to it
            print("directory doesn't exist, creating it now")
            os.makedirs(os.getcwd()+ "/" + head, exist_ok=True)
            print("directory created, download initiated")
            block_blob_service.get_blob_to_path('testcontainer',blob.name,os.getcwd()+ "/" + head + "/" + tail)
    else:
        block_blob_service.get_blob_to_path('testcontainer',blob.name,blob.name)

此處也提供相同的代碼https://gist.github.com/brijrajsingh/35cd591c2ca90916b27742d52a3cf6ba

自 @brij-raj-singh-msft 回答以來,Microsoft 發布了適用於 Python 的 Azure Storage Blob 客戶端庫的 Gen2 版本。 (以下代碼已使用 12.5.0 版進行測試)此代碼段已於 2020 年 9 月 25 日測試

import os
from azure.storage.blob import BlobServiceClient,ContainerClient, BlobClient
import datetime

# Assuming your Azure connection string environment variable set.
# If not, create BlobServiceClient using trl & credentials.
#Example: https://docs.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.blobserviceclient 

connection_string = os.getenv("AZURE_STORAGE_CONNECTION_STRING")

blob_service_client = BlobServiceClient.from_connection_string(conn_str=connection_string) 
# create container client
container_name = 'test2'
container_client = blob_service_client.get_container_client(container_name)

#Check if there is a top level local folder exist for container.
#If not, create one
data_dir ='Z:/azure_storage'
data_dir = data_dir+ "/" + container_name
if not(os.path.isdir(data_dir)):
    print("[{}]:[INFO] : Creating local directory for container".format(datetime.datetime.utcnow()))
    os.makedirs(data_dir, exist_ok=True)
    
#code below lists all the blobs in the container and downloads them one after another
blob_list = container_client.list_blobs()
for blob in blob_list:
    print("[{}]:[INFO] : Blob name: {}".format(datetime.datetime.utcnow(), blob.name))
    #check if the path contains a folder structure, create the folder structure
    if "/" in "{}".format(blob.name):
        #extract the folder path and check if that folder exists locally, and if not create it
        head, tail = os.path.split("{}".format(blob.name))
        if not (os.path.isdir(data_dir+ "/" + head)):
            #create the diretcory and download the file to it
            print("[{}]:[INFO] : {} directory doesn't exist, creating it now".format(datetime.datetime.utcnow(),data_dir+ "/" + head))
            os.makedirs(data_dir+ "/" + head, exist_ok=True)
    # Finally, download the blob
    blob_client = container_client.get_blob_client(blob.name)
    dowlload_blob(blob_client,data_dir+ "/"+blob.name)

def dowlload_blob(blob_client, destination_file):
    print("[{}]:[INFO] : Downloading {} ...".format(datetime.datetime.utcnow(),destination_file))
    with open(destination_file, "wb") as my_blob:
        blob_data = blob_client.download_blob()
        blob_data.readinto(my_blob)
    print("[{}]:[INFO] : download finished".format(datetime.datetime.utcnow()))    

此處也提供相同的代碼https://gist.github.com/allene/6bbb36ec3ed08b419206156567290b13

目前,我們似乎無法使用單個 API 從容器中直接下載所有 blob。 我們可以在https://msdn.microsoft.com/en-us/library/azure/dd179377.aspx獲得所有可用的 blob 操作。

所以我們可以列出 blob 的ListGenerator ,然后循環下載 blob。 EG:

result = blob_service.list_blobs(container)
for b in result.items:
    r = blob_service.get_blob_to_path(container,b.name,"folder/{}".format(b.name))

更新

使用azure-storage-python時導入 blockblob 服務:

from azure.storage.blob import BlockBlobService

我為 Azure CLI 制作了一個Python 包裝器,它使我們能夠批量下載/上傳。 這樣我們就可以下載一個完整的容器或從容器中下載某些文件。

安裝:

pip install azurebatchload
import os
from azurebatchload.download import DownloadBatch

az_batch = DownloadBatch(
    destination='../pdfs',
    source='blobcontainername',
    pattern='*.pdf'
)
az_batch.download()

這是一個簡單的腳本 (PowerShell),它將遍歷單個容器並將其中的所有內容下載到您提供的 $Destination

 # # Download All Blobs in a Container # # Connect to Azure Account Connect-AzAccount # Set Variables $Destination="C:\Software" $ResourceGroupName = 'resource group name' $ContainerName = 'container name' $storageAccName = 'storage account name' # Function to download all blob contents Function DownloadBlobContents { Write-Host -ForegroundColor Green "Download blob contents from storage container.." # Get the storage account $StorageAcc = Get-AzStorageAccount -ResourceGroupName $resourceGroupName -Name $storageAccName # Get the storage account context $Ctx = $StorageAcc.Context # Get all containers $Containers = Get-AzStorageContainer -Context $Ctx Write-Host -ForegroundColor Magenta $Container.Name "Creating or checking folder presence" # Create or check folder presence New-Item -ItemType Directory -Path $Destination -Force # Get the blob contents from the container $BlobConents = Get-AzStorageBlob -Container $ContainerName -Context $Ctx # Loop through each blob and download each one until they are all complete foreach($BlobConent in $BlobConents) { # Download the blob content Get-AzStorageBlobContent -Container $ContainerName -Context $Ctx -Blob $BlobConent.Name -Destination $Destination -Force Write-Host -ForegroundColor Green "Downloaded a blob content" } } DownloadBlobContents

我想學習的是不必使用 Connect-AzAccount 而是調用存儲帳戶的密鑰,因此我可以運行它而無需在

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM