简体   繁体   中英

Download 1 days azure blob file python

Requirement:

Files are being uploaded into azure container from various machines. Need to write a python script to download one day's file from azure container which will be scheduled daily.

Code:

import datetime
import os
import pytz

from azure.storage.blob import BlobClient, ContainerClient

utc=pytz.UTC
container_connection_string ="CONNECTION_STRING"
container_service_client = ContainerClient.from_connection_string(conn_str=container_connection_string, container_name="CONTAINER_NAME")

date_folder = start_time.strftime("%d-%m-%Y")
base_path = r"DOWNLOAD_PATH"
count = 0
threshold_time = utc.localize(start_time  - datetime.timedelta(days = 1))
blob_list = container_service_client.list_blobs()

if not os.path.exists("{}\{}".format(base_path, date_folder)):
    os.makedirs("{}\{}".format(base_path, date_folder))
print("Starting")

for ind, blob in enumerate(blob_list):
    if threshold_time < blob.last_modified:
        count += 1
        print(count, blob.name)
        blob_name = blob.name       
        blob = BlobClient.from_connection_string(conn_str=container_connection_string, container_name="CONTAINER_NAME", blob_name=blob_name)
        with open("{}\{}\{}".format(base_path, date_folder, blob_name), "wb") as my_blob:
            blob_data = blob.download_blob()
            blob_data.readinto(my_blob)

Problem:

The above script iterates through all the blob in the container and checkeds if the blobs are less than one day and downloads them if they are. Since daily 15,000+ file are being uploaded in the blob traversing through them to identify today file are very time consuming and downloading blobs take a lot of time.

With the current approach, I believe there's no other way than to enumerate the blobs and filter on the client side to find the matching blobs.


However I do have an alternate solution. It's a bit convoluted solution but I thought I would propose nonetheless:).

Essentially the solution involves making use of Azure Event Grid and invoke an Azure Function on Microsoft.Storage.BlobCreated event which gets fired when a blob is created or replaced. This Azure Function will copy the blob to a different blob container. Now for each day a new blob container will be created and this blob container will hold blobs only for just that day. This makes iterating over blobs much easier.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM