简体   繁体   English

下载 Azure 存储容器中的所有 blob

[英]Download all blobs within an Azure Storage container

I've managed to write a python script to list out all the blobs within a container.我已经设法编写了一个 python 脚本来列出容器中的所有 blob。

import azure
from azure.storage.blob import BlobService
from azure.storage import *

blob_service = BlobService(account_name='<CONTAINER>', account_key='<ACCOUNT_KEY>')


blobs = []
marker = None
while True:
    batch = blob_service.list_blobs('<CONAINER>', marker=marker)
    blobs.extend(batch)
    if not batch.next_marker:
        break
    marker = batch.next_marker
for blob in blobs:
    print(blob.name)

Like I said this only lists the blobs that I want to download.就像我说的,这只列出了我要下载的 blob。 I've moved onto the Azure CLI to see if that could aid in what I want to do.我已经转到 Azure CLI,看看它是否可以帮助我完成我想做的事情。 I'm able to download a single blob with我可以下载一个 blob

azure storage blob download [container]

it then prompts me specify a blob which I can grab from the python script.然后它提示我指定一个 blob,我可以从 python 脚本中获取它。 The way I would have to download all those blobs is to copy and paste them into the prompt after the command used above.我必须下载所有这些 blob 的方法是在上面使用的命令之后将它们复制并粘贴到提示符中。 Is there a way I can either:有没有办法我可以:

A .一个 Write a bash script to iterate through the list of blobs by executing the command, then pasting the next blob name in the prompt.编写一个 bash 脚本,通过执行命令来遍历 blob 列表,然后在提示中粘贴下一个 blob 名称。

B . Specify to download the container in either the python script or Azure CLI.指定在 python 脚本或 Azure CLI 中下载容器。 Is there something I'm not seeing to download the whole container?下载整个容器时有什么我看不到的吗?

@gary-liu-msft solution is correct. @gary-liu-msft 解决方案是正确的。 I made some more changes to the same, now the code can iterate through the containers and the folder structure in it (PS - there are no folders in containers, just path), check if the same directory structure exists in client and if not then create that directory structure and download the blobs in those path.我对其进行了更多更改,现在代码可以遍历容器及其中的文件夹结构(PS - 容器中没有文件夹,只有路径),检查客户端中是否存在相同的目录结构,如果不存在创建该目录结构并下载这些路径中的 blob。 It supports the long paths with embedded sub directories.它支持带有嵌入式子目录的长路径。

from azure.storage.blob import BlockBlobService
from azure.storage.blob import PublicAccess
import os

#name of your storage account and the access key from Settings->AccessKeys->key1
block_blob_service = BlockBlobService(account_name='storageaccountname', account_key='accountkey')

#name of the container
generator = block_blob_service.list_blobs('testcontainer')

#code below lists all the blobs in the container and downloads them one after another
for blob in generator:
    print(blob.name)
    print("{}".format(blob.name))
    #check if the path contains a folder structure, create the folder structure
    if "/" in "{}".format(blob.name):
        print("there is a path in this")
        #extract the folder path and check if that folder exists locally, and if not create it
        head, tail = os.path.split("{}".format(blob.name))
        print(head)
        print(tail)
        if (os.path.isdir(os.getcwd()+ "/" + head)):
            #download the files to this directory
            print("directory and sub directories exist")
            block_blob_service.get_blob_to_path('testcontainer',blob.name,os.getcwd()+ "/" + head + "/" + tail)
        else:
            #create the diretcory and download the file to it
            print("directory doesn't exist, creating it now")
            os.makedirs(os.getcwd()+ "/" + head, exist_ok=True)
            print("directory created, download initiated")
            block_blob_service.get_blob_to_path('testcontainer',blob.name,os.getcwd()+ "/" + head + "/" + tail)
    else:
        block_blob_service.get_blob_to_path('testcontainer',blob.name,blob.name)

The same code is also available here https://gist.github.com/brijrajsingh/35cd591c2ca90916b27742d52a3cf6ba此处也提供相同的代码https://gist.github.com/brijrajsingh/35cd591c2ca90916b27742d52a3cf6ba

Since @brij-raj-singh-msft answer, Microsoft released Gen2 version of Azure Storage Blobs client library for Python.自 @brij-raj-singh-msft 回答以来,Microsoft 发布了适用于 Python 的 Azure Storage Blob 客户端库的 Gen2 版本。 (code below is tested with Version 12.5.0) This snippet is tested on 9/25/2020 (以下代码已使用 12.5.0 版进行测试)此代码段已于 2020 年 9 月 25 日测试

import os
from azure.storage.blob import BlobServiceClient,ContainerClient, BlobClient
import datetime

# Assuming your Azure connection string environment variable set.
# If not, create BlobServiceClient using trl & credentials.
#Example: https://docs.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.blobserviceclient 

connection_string = os.getenv("AZURE_STORAGE_CONNECTION_STRING")

blob_service_client = BlobServiceClient.from_connection_string(conn_str=connection_string) 
# create container client
container_name = 'test2'
container_client = blob_service_client.get_container_client(container_name)

#Check if there is a top level local folder exist for container.
#If not, create one
data_dir ='Z:/azure_storage'
data_dir = data_dir+ "/" + container_name
if not(os.path.isdir(data_dir)):
    print("[{}]:[INFO] : Creating local directory for container".format(datetime.datetime.utcnow()))
    os.makedirs(data_dir, exist_ok=True)
    
#code below lists all the blobs in the container and downloads them one after another
blob_list = container_client.list_blobs()
for blob in blob_list:
    print("[{}]:[INFO] : Blob name: {}".format(datetime.datetime.utcnow(), blob.name))
    #check if the path contains a folder structure, create the folder structure
    if "/" in "{}".format(blob.name):
        #extract the folder path and check if that folder exists locally, and if not create it
        head, tail = os.path.split("{}".format(blob.name))
        if not (os.path.isdir(data_dir+ "/" + head)):
            #create the diretcory and download the file to it
            print("[{}]:[INFO] : {} directory doesn't exist, creating it now".format(datetime.datetime.utcnow(),data_dir+ "/" + head))
            os.makedirs(data_dir+ "/" + head, exist_ok=True)
    # Finally, download the blob
    blob_client = container_client.get_blob_client(blob.name)
    dowlload_blob(blob_client,data_dir+ "/"+blob.name)

def dowlload_blob(blob_client, destination_file):
    print("[{}]:[INFO] : Downloading {} ...".format(datetime.datetime.utcnow(),destination_file))
    with open(destination_file, "wb") as my_blob:
        blob_data = blob_client.download_blob()
        blob_data.readinto(my_blob)
    print("[{}]:[INFO] : download finished".format(datetime.datetime.utcnow()))    

The same code is also available herehttps://gist.github.com/allene/6bbb36ec3ed08b419206156567290b13此处也提供相同的代码https://gist.github.com/allene/6bbb36ec3ed08b419206156567290b13

Currently, it seems we cannot directly download all the blobs from a container with a single API.目前,我们似乎无法使用单个 API 从容器中直接下载所有 blob。 And we can get all the available operations with blobs at https://msdn.microsoft.com/en-us/library/azure/dd179377.aspx .我们可以在https://msdn.microsoft.com/en-us/library/azure/dd179377.aspx获得所有可用的 blob 操作。

So we can list the ListGenerator of blobs, then download the blobs in loop.所以我们可以列出 blob 的ListGenerator ,然后循环下载 blob。 EG: EG:

result = blob_service.list_blobs(container)
for b in result.items:
    r = blob_service.get_blob_to_path(container,b.name,"folder/{}".format(b.name))

update更新

import blockblob service when using azure-storage-python :使用azure-storage-python时导入 blockblob 服务:

from azure.storage.blob import BlockBlobService

I made a Python wrapper for the Azure CLI which enables us to do downloads / uploads in batches.我为 Azure CLI 制作了一个Python 包装器,它使我们能够批量下载/上传。 This way we can download a complete container or certain files from a container.这样我们就可以下载一个完整的容器或从容器中下载某些文件。

To install:安装:

pip install azurebatchload
import os
from azurebatchload.download import DownloadBatch

az_batch = DownloadBatch(
    destination='../pdfs',
    source='blobcontainername',
    pattern='*.pdf'
)
az_batch.download()

Here is a simple script (PowerShell) that will iterate through a single container and download everything in there to the $Destination you provide这是一个简单的脚本 (PowerShell),它将遍历单个容器并将其中的所有内容下载到您提供的 $Destination

 # # Download All Blobs in a Container # # Connect to Azure Account Connect-AzAccount # Set Variables $Destination="C:\Software" $ResourceGroupName = 'resource group name' $ContainerName = 'container name' $storageAccName = 'storage account name' # Function to download all blob contents Function DownloadBlobContents { Write-Host -ForegroundColor Green "Download blob contents from storage container.." # Get the storage account $StorageAcc = Get-AzStorageAccount -ResourceGroupName $resourceGroupName -Name $storageAccName # Get the storage account context $Ctx = $StorageAcc.Context # Get all containers $Containers = Get-AzStorageContainer -Context $Ctx Write-Host -ForegroundColor Magenta $Container.Name "Creating or checking folder presence" # Create or check folder presence New-Item -ItemType Directory -Path $Destination -Force # Get the blob contents from the container $BlobConents = Get-AzStorageBlob -Container $ContainerName -Context $Ctx # Loop through each blob and download each one until they are all complete foreach($BlobConent in $BlobConents) { # Download the blob content Get-AzStorageBlobContent -Container $ContainerName -Context $Ctx -Blob $BlobConent.Name -Destination $Destination -Force Write-Host -ForegroundColor Green "Downloaded a blob content" } } DownloadBlobContents

Something I would like to learn is to not have to use the Connect-AzAccount but instead call the keys to the storage account therefore I can run this without having to authenticate in我想学习的是不必使用 Connect-AzAccount 而是调用存储帐户的密钥,因此我可以运行它而无需在

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何通过node.js从azure存储下载blob到本地存储? - How to download blobs from azure storage via node js to local storage? 拥有许多 Azure 小的存储 blob 容器(每个都有一些 blob)还是一个非常大的容器和大量的 blob 更好? - Is it better to have many small Azure storage blob containers (each with some blobs) or one really large container with tons of blobs? Azure Blob 存储 - 可以列出 blob 但不能删除 blob - Azure Blob Storage - can list blobs but not delete blobs Azure 二头肌存储帐户循环 blob 错误 - Azure bicep Storage Account loop blobs error PowerShell Azure Function:列出存储 A/c blob = 未经授权 - PowerShell Azure Function: list Storage A/c blobs = unauthorised 在 Azure Blob 存储上上传大 Blob 时延长令牌过期时间 - Extend token expiration when uploading large blobs on Azure Blob Storage Azure Function 读取和写入 Blob 存储引发内部 500 错误 - Azure Function Read & Write to Blobs Storage throwing internal 500 error 如何使用 Azure Blob 创建 HTML 链接以下载链接 - How to create HTML links to download links using Azure Blobs 使用 web 应用程序获取容器中所有 blob 的列表 Etag 值 - Get list Etag Values for all the blobs in container using web app 如何将 Azure 存储帐户内容(表、队列、blob)复制到其他存储帐户 - How to copy Azure storage account contents (tables, queues, blobs) to other storage account
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM