简体   繁体   中英

How to efficiently list all files in an Azure blob using python?

I need to list all files in an Azure blob using python. Currently I use the code below. this worked well when there were few files. But now I have a large number of files and the script runs more than an hour. The time-consuming part is the for loop. How can this be done faster?

import os, uuid
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
import pandas as pd

connect_str = "************"

blob_service_client = BlobServiceCliaent.from_connection_string(connect_str)

blob_service_client.get_account_information()
c = blob_service_client.list_containers()

container_client = blob_service_client.get_container_client("blobName")

l = []
for blob in container_client.list_blobs():
    l.append(blob.name)

I could able to achieve this using list_blobs method of BlockBlobService . After reproducing from my end, I have observed that the list_blobs method of BlobServiceClient returns all the properties of blob which is taking more time to proocess whereas BlockBlobService returns objects. Below is the code that was working for me.

import os
from azure.storage.blob import BlockBlobService
import datetime

ACCOUNT_NAME = "<YOUR_ACCOUNT_NAME>"
CONTAINER_NAME = "<YOUR_CONTAINER_NAME>"
SAS_TOKEN='<YOUR_SAS_TOKEN>'

block_blob_service = BlockBlobService(account_name=ACCOUNT_NAME,account_key=None,sas_token=SAS_TOKEN)

# Lists All Blobs
l =[]
print("\nList blobs in the container")
generator = block_blob_service.list_blobs(CONTAINER_NAME)
for blob in generator:
    print("a"+str(datetime.datetime.now()))
    blobname=blob
    l.append(blob.name)
    
print(l)
    
print("b"+str(datetime.datetime.now()))

OUTPUT:

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM