繁体   English   中英

使用 python 从多个容器中下载特定的 blob (Azure)

[英]Donwload a specific blob (Azure) from multiples containers with python

我只是在寻求帮助。 我对 python 很陌生,但我尝试做点什么。 我需要从多个容器中下载特定的 blob(实际上是 a.xlsx 文件)。 我的意思是,这个过程每天都会创建一个容器,但我感兴趣的是从每个容器下载一个文件,我尝试了以下方法:

# download_blobs.py
# Python program to bulk download blob files from azure storage
# Uses latest python SDK() for Azure blob storage
# Requires python 3.6 or above
import os
from azure.storage.blob import BlobServiceClient, BlobClient
from azure.storage.blob import ContentSettings, ContainerClient
 
# IMPORTANT: Replace connection string with your storage account connection string
# Usually starts with DefaultEndpointsProtocol=https;...
MY_CONNECTION_STRING = "my_conection_string"
 
# Replace with blob container
MY_BLOB_CONTAINER = "^092022"
 
# Replace with the local folder where you want files to be downloaded
LOCAL_BLOB_PATH = "a_local_path"

# Replace with the blob to download

BLOB_NAME = "^xlsx'"

class AzureBlobFileDownloader:
  def __init__(self):
    print("Intializing AzureBlobFileDownloader")
 
    # Initialize the connection to Azure storage account
    self.blob_service_client =  BlobServiceClient.from_connection_string(MY_CONNECTION_STRING)
    self.my_container = self.blob_service_client.get_container_client(MY_BLOB_CONTAINER)
 
 
  def save_blob(self,file_name,file_content):
    # Get full path to the file
    download_file_path = os.path.join(LOCAL_BLOB_PATH, file_name)
 
    # for nested blobs, create local path as well!
    os.makedirs(os.path.dirname(download_file_path), exist_ok=True)
 
    with open(download_file_path, "wb") as file:
      file.write(file_content)
 
  def download_all_blobs_in_container(self):

    my_blobs = self.my_container.list_blobs(BLOB_NAME)
    for blob in my_blobs:
      print(blob.name)
      bytes = self.my_container.get_blob_client(blob).download_blob().readall()
      self.save_blob(blob.name, bytes)
 
# Initialize class and upload files
azure_blob_file_downloader = AzureBlobFileDownloader()
azure_blob_file_downloader.download_all_blobs_in_container()

每天创建的每个容器都有以下名称:

01092022 - 02092022 - 03092022 -。 . .

我要下载的 blob 是:

p.01092022.xlsx - p.02092022.xlsx - p.03092022.xlsx -。 . .

如何通过每个容器生成 python go 并根据它们具有的名称顺序下载与每个容器相关的文件?

谢谢你的帮助!

伟大的。

它每天只创建 1 个包含 1 个文件的容器吗? 总是遵循这种模式? (01092022 - 02092022 - 03092022)

如果你坚持命名约定,你可以使用这样的东西:

containers = blob_service_client_instance.list_containers() 
for c in containers:    
    blob_name = 'p' + c.name + ".xlsx"    
    blob_client_instance = blob_service_client_instance.get_blob_client(c.name, blob_name, snapshot=None)
    exists = blob_client_instance.exists()
    if exists == True:
        blob_data = blob_client_instance.download_blob()
        data = blob_data.readall()

但这会在一段时间后开始长期运行。 您可能需要添加一行来搜索范围,例如过去 6 个月

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM