I need to list all files in an Azure blob using python. Currently I use the code below. this worked well when there were few files. But now I have a large number of files and the script runs more than an hour. The time-consuming part is the for loop. How can this be done faster?
import os, uuid
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
import pandas as pd
connect_str = "************"
blob_service_client = BlobServiceCliaent.from_connection_string(connect_str)
blob_service_client.get_account_information()
c = blob_service_client.list_containers()
container_client = blob_service_client.get_container_client("blobName")
l = []
for blob in container_client.list_blobs():
l.append(blob.name)
I could able to achieve this using list_blobs
method of BlockBlobService
. After reproducing from my end, I have observed that the list_blobs method of BlobServiceClient
returns all the properties of blob which is taking more time to proocess whereas BlockBlobService
returns objects. Below is the code that was working for me.
import os
from azure.storage.blob import BlockBlobService
import datetime
ACCOUNT_NAME = "<YOUR_ACCOUNT_NAME>"
CONTAINER_NAME = "<YOUR_CONTAINER_NAME>"
SAS_TOKEN='<YOUR_SAS_TOKEN>'
block_blob_service = BlockBlobService(account_name=ACCOUNT_NAME,account_key=None,sas_token=SAS_TOKEN)
# Lists All Blobs
l =[]
print("\nList blobs in the container")
generator = block_blob_service.list_blobs(CONTAINER_NAME)
for blob in generator:
print("a"+str(datetime.datetime.now()))
blobname=blob
l.append(blob.name)
print(l)
print("b"+str(datetime.datetime.now()))
OUTPUT:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.