使用 python 从多个容器中下载特定的 blob (Azure)

Question

我只是在寻求帮助。 我对 python 很陌生，但我尝试做点什么。 我需要从多个容器中下载特定的 blob（实际上是 a.xlsx 文件）。 我的意思是，这个过程每天都会创建一个容器，但我感兴趣的是从每个容器下载一个文件，我尝试了以下方法：

# download_blobs.py
# Python program to bulk download blob files from azure storage
# Uses latest python SDK() for Azure blob storage
# Requires python 3.6 or above
import os
from azure.storage.blob import BlobServiceClient, BlobClient
from azure.storage.blob import ContentSettings, ContainerClient
 
# IMPORTANT: Replace connection string with your storage account connection string
# Usually starts with DefaultEndpointsProtocol=https;...
MY_CONNECTION_STRING = "my_conection_string"
 
# Replace with blob container
MY_BLOB_CONTAINER = "^092022"
 
# Replace with the local folder where you want files to be downloaded
LOCAL_BLOB_PATH = "a_local_path"

# Replace with the blob to download

BLOB_NAME = "^xlsx'"

class AzureBlobFileDownloader:
  def __init__(self):
    print("Intializing AzureBlobFileDownloader")
 
    # Initialize the connection to Azure storage account
    self.blob_service_client =  BlobServiceClient.from_connection_string(MY_CONNECTION_STRING)
    self.my_container = self.blob_service_client.get_container_client(MY_BLOB_CONTAINER)
 
 
  def save_blob(self,file_name,file_content):
    # Get full path to the file
    download_file_path = os.path.join(LOCAL_BLOB_PATH, file_name)
 
    # for nested blobs, create local path as well!
    os.makedirs(os.path.dirname(download_file_path), exist_ok=True)
 
    with open(download_file_path, "wb") as file:
      file.write(file_content)
 
  def download_all_blobs_in_container(self):

    my_blobs = self.my_container.list_blobs(BLOB_NAME)
    for blob in my_blobs:
      print(blob.name)
      bytes = self.my_container.get_blob_client(blob).download_blob().readall()
      self.save_blob(blob.name, bytes)
 
# Initialize class and upload files
azure_blob_file_downloader = AzureBlobFileDownloader()
azure_blob_file_downloader.download_all_blobs_in_container()

每天创建的每个容器都有以下名称：

01092022 - 02092022 - 03092022 -。 . .

我要下载的 blob 是：

p.01092022.xlsx - p.02092022.xlsx - p.03092022.xlsx -。 . .

如何通过每个容器生成 python go 并根据它们具有的名称顺序下载与每个容器相关的文件？

谢谢你的帮助！

伟大的。

Answer 1

它每天只创建 1 个包含 1 个文件的容器吗？ 总是遵循这种模式？ (01092022 - 02092022 - 03092022)

如果你坚持命名约定，你可以使用这样的东西：

containers = blob_service_client_instance.list_containers() 
for c in containers:    
    blob_name = 'p' + c.name + ".xlsx"    
    blob_client_instance = blob_service_client_instance.get_blob_client(c.name, blob_name, snapshot=None)
    exists = blob_client_instance.exists()
    if exists == True:
        blob_data = blob_client_instance.download_blob()
        data = blob_data.readall()

但这会在一段时间后开始长期运行。 您可能需要添加一行来搜索范围，例如过去 6 个月

使用 python 从多个容器中下载特定的 blob (Azure)

问题描述

1 个解决方案

解决方案1
0 2022-09-24 23:21:26

使用 python 从多个容器中下载特定的 blob (Azure)

问题描述

1 个解决方案

解决方案1 0 2022-09-24 23:21:26

解决方案1
0 2022-09-24 23:21:26