[英]Download all blobs within an Azure Storage container
I've managed to write a python script to list out all the blobs within a container.我已经设法编写了一个 python 脚本来列出容器中的所有 blob。
import azure
from azure.storage.blob import BlobService
from azure.storage import *
blob_service = BlobService(account_name='<CONTAINER>', account_key='<ACCOUNT_KEY>')
blobs = []
marker = None
while True:
batch = blob_service.list_blobs('<CONAINER>', marker=marker)
blobs.extend(batch)
if not batch.next_marker:
break
marker = batch.next_marker
for blob in blobs:
print(blob.name)
Like I said this only lists the blobs that I want to download.就像我说的,这只列出了我要下载的 blob。 I've moved onto the Azure CLI to see if that could aid in what I want to do.
我已经转到 Azure CLI,看看它是否可以帮助我完成我想做的事情。 I'm able to download a single blob with
我可以下载一个 blob
azure storage blob download [container]
it then prompts me specify a blob which I can grab from the python script.然后它提示我指定一个 blob,我可以从 python 脚本中获取它。 The way I would have to download all those blobs is to copy and paste them into the prompt after the command used above.
我必须下载所有这些 blob 的方法是在上面使用的命令之后将它们复制并粘贴到提示符中。 Is there a way I can either:
有没有办法我可以:
A .一个。 Write a bash script to iterate through the list of blobs by executing the command, then pasting the next blob name in the prompt.
编写一个 bash 脚本,通过执行命令来遍历 blob 列表,然后在提示中粘贴下一个 blob 名称。
B .乙。 Specify to download the container in either the python script or Azure CLI.
指定在 python 脚本或 Azure CLI 中下载容器。 Is there something I'm not seeing to download the whole container?
下载整个容器时有什么我看不到的吗?
@gary-liu-msft solution is correct. @gary-liu-msft 解决方案是正确的。 I made some more changes to the same, now the code can iterate through the containers and the folder structure in it (PS - there are no folders in containers, just path), check if the same directory structure exists in client and if not then create that directory structure and download the blobs in those path.
我对其进行了更多更改,现在代码可以遍历容器及其中的文件夹结构(PS - 容器中没有文件夹,只有路径),检查客户端中是否存在相同的目录结构,如果不存在创建该目录结构并下载这些路径中的 blob。 It supports the long paths with embedded sub directories.
它支持带有嵌入式子目录的长路径。
from azure.storage.blob import BlockBlobService
from azure.storage.blob import PublicAccess
import os
#name of your storage account and the access key from Settings->AccessKeys->key1
block_blob_service = BlockBlobService(account_name='storageaccountname', account_key='accountkey')
#name of the container
generator = block_blob_service.list_blobs('testcontainer')
#code below lists all the blobs in the container and downloads them one after another
for blob in generator:
print(blob.name)
print("{}".format(blob.name))
#check if the path contains a folder structure, create the folder structure
if "/" in "{}".format(blob.name):
print("there is a path in this")
#extract the folder path and check if that folder exists locally, and if not create it
head, tail = os.path.split("{}".format(blob.name))
print(head)
print(tail)
if (os.path.isdir(os.getcwd()+ "/" + head)):
#download the files to this directory
print("directory and sub directories exist")
block_blob_service.get_blob_to_path('testcontainer',blob.name,os.getcwd()+ "/" + head + "/" + tail)
else:
#create the diretcory and download the file to it
print("directory doesn't exist, creating it now")
os.makedirs(os.getcwd()+ "/" + head, exist_ok=True)
print("directory created, download initiated")
block_blob_service.get_blob_to_path('testcontainer',blob.name,os.getcwd()+ "/" + head + "/" + tail)
else:
block_blob_service.get_blob_to_path('testcontainer',blob.name,blob.name)
The same code is also available here https://gist.github.com/brijrajsingh/35cd591c2ca90916b27742d52a3cf6ba此处也提供相同的代码https://gist.github.com/brijrajsingh/35cd591c2ca90916b27742d52a3cf6ba
Since @brij-raj-singh-msft answer, Microsoft released Gen2 version of Azure Storage Blobs client library for Python.自 @brij-raj-singh-msft 回答以来,Microsoft 发布了适用于 Python 的 Azure Storage Blob 客户端库的 Gen2 版本。 (code below is tested with Version 12.5.0) This snippet is tested on 9/25/2020
(以下代码已使用 12.5.0 版进行测试)此代码段已于 2020 年 9 月 25 日测试
import os
from azure.storage.blob import BlobServiceClient,ContainerClient, BlobClient
import datetime
# Assuming your Azure connection string environment variable set.
# If not, create BlobServiceClient using trl & credentials.
#Example: https://docs.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.blobserviceclient
connection_string = os.getenv("AZURE_STORAGE_CONNECTION_STRING")
blob_service_client = BlobServiceClient.from_connection_string(conn_str=connection_string)
# create container client
container_name = 'test2'
container_client = blob_service_client.get_container_client(container_name)
#Check if there is a top level local folder exist for container.
#If not, create one
data_dir ='Z:/azure_storage'
data_dir = data_dir+ "/" + container_name
if not(os.path.isdir(data_dir)):
print("[{}]:[INFO] : Creating local directory for container".format(datetime.datetime.utcnow()))
os.makedirs(data_dir, exist_ok=True)
#code below lists all the blobs in the container and downloads them one after another
blob_list = container_client.list_blobs()
for blob in blob_list:
print("[{}]:[INFO] : Blob name: {}".format(datetime.datetime.utcnow(), blob.name))
#check if the path contains a folder structure, create the folder structure
if "/" in "{}".format(blob.name):
#extract the folder path and check if that folder exists locally, and if not create it
head, tail = os.path.split("{}".format(blob.name))
if not (os.path.isdir(data_dir+ "/" + head)):
#create the diretcory and download the file to it
print("[{}]:[INFO] : {} directory doesn't exist, creating it now".format(datetime.datetime.utcnow(),data_dir+ "/" + head))
os.makedirs(data_dir+ "/" + head, exist_ok=True)
# Finally, download the blob
blob_client = container_client.get_blob_client(blob.name)
dowlload_blob(blob_client,data_dir+ "/"+blob.name)
def dowlload_blob(blob_client, destination_file):
print("[{}]:[INFO] : Downloading {} ...".format(datetime.datetime.utcnow(),destination_file))
with open(destination_file, "wb") as my_blob:
blob_data = blob_client.download_blob()
blob_data.readinto(my_blob)
print("[{}]:[INFO] : download finished".format(datetime.datetime.utcnow()))
The same code is also available herehttps://gist.github.com/allene/6bbb36ec3ed08b419206156567290b13此处也提供相同的代码https://gist.github.com/allene/6bbb36ec3ed08b419206156567290b13
Currently, it seems we cannot directly download all the blobs from a container with a single API.目前,我们似乎无法使用单个 API 从容器中直接下载所有 blob。 And we can get all the available operations with blobs at https://msdn.microsoft.com/en-us/library/azure/dd179377.aspx .
我们可以在https://msdn.microsoft.com/en-us/library/azure/dd179377.aspx获得所有可用的 blob 操作。
So we can list the ListGenerator
of blobs, then download the blobs in loop.所以我们可以列出 blob 的
ListGenerator
,然后循环下载 blob。 EG: EG:
result = blob_service.list_blobs(container)
for b in result.items:
r = blob_service.get_blob_to_path(container,b.name,"folder/{}".format(b.name))
import blockblob service when using azure-storage-python :使用azure-storage-python时导入 blockblob 服务:
from azure.storage.blob import BlockBlobService
I made a Python wrapper for the Azure CLI which enables us to do downloads / uploads in batches.我为 Azure CLI 制作了一个Python 包装器,它使我们能够批量下载/上传。 This way we can download a complete container or certain files from a container.
这样我们就可以下载一个完整的容器或从容器中下载某些文件。
To install:安装:
pip install azurebatchload
import os
from azurebatchload.download import DownloadBatch
az_batch = DownloadBatch(
destination='../pdfs',
source='blobcontainername',
pattern='*.pdf'
)
az_batch.download()
Here is a simple script (PowerShell) that will iterate through a single container and download everything in there to the $Destination you provide这是一个简单的脚本 (PowerShell),它将遍历单个容器并将其中的所有内容下载到您提供的 $Destination
# # Download All Blobs in a Container # # Connect to Azure Account Connect-AzAccount # Set Variables $Destination="C:\Software" $ResourceGroupName = 'resource group name' $ContainerName = 'container name' $storageAccName = 'storage account name' # Function to download all blob contents Function DownloadBlobContents { Write-Host -ForegroundColor Green "Download blob contents from storage container.." # Get the storage account $StorageAcc = Get-AzStorageAccount -ResourceGroupName $resourceGroupName -Name $storageAccName # Get the storage account context $Ctx = $StorageAcc.Context # Get all containers $Containers = Get-AzStorageContainer -Context $Ctx Write-Host -ForegroundColor Magenta $Container.Name "Creating or checking folder presence" # Create or check folder presence New-Item -ItemType Directory -Path $Destination -Force # Get the blob contents from the container $BlobConents = Get-AzStorageBlob -Container $ContainerName -Context $Ctx # Loop through each blob and download each one until they are all complete foreach($BlobConent in $BlobConents) { # Download the blob content Get-AzStorageBlobContent -Container $ContainerName -Context $Ctx -Blob $BlobConent.Name -Destination $Destination -Force Write-Host -ForegroundColor Green "Downloaded a blob content" } } DownloadBlobContents
Something I would like to learn is to not have to use the Connect-AzAccount but instead call the keys to the storage account therefore I can run this without having to authenticate in我想学习的是不必使用 Connect-AzAccount 而是调用存储帐户的密钥,因此我可以运行它而无需在
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.