[英]How to efficiently list all files in an Azure blob using python?
I need to list all files in an Azure blob using python. Currently I use the code below.我需要使用 python 列出 Azure blob 中的所有文件。目前我使用下面的代码。 this worked well when there were few files.
当文件很少时,这很有效。 But now I have a large number of files and the script runs more than an hour.
但是现在我有大量文件并且脚本运行了一个多小时。 The time-consuming part is the for loop.
比较耗时的部分是for循环。 How can this be done faster?
如何才能更快地做到这一点?
import os, uuid
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
import pandas as pd
connect_str = "************"
blob_service_client = BlobServiceCliaent.from_connection_string(connect_str)
blob_service_client.get_account_information()
c = blob_service_client.list_containers()
container_client = blob_service_client.get_container_client("blobName")
l = []
for blob in container_client.list_blobs():
l.append(blob.name)
I could able to achieve this using list_blobs
method of BlockBlobService
.我可以使用
list_blobs
BlockBlobService
来实现这一点。 After reproducing from my end, I have observed that the list_blobs method of BlobServiceClient
returns all the properties of blob which is taking more time to proocess whereas BlockBlobService
returns objects.从我这边重现后,我观察到
BlobServiceClient
的list_blobs方法返回 blob 的所有属性,这需要更多时间来处理,而BlockBlobService
返回对象。 Below is the code that was working for me.以下是为我工作的代码。
import os
from azure.storage.blob import BlockBlobService
import datetime
ACCOUNT_NAME = "<YOUR_ACCOUNT_NAME>"
CONTAINER_NAME = "<YOUR_CONTAINER_NAME>"
SAS_TOKEN='<YOUR_SAS_TOKEN>'
block_blob_service = BlockBlobService(account_name=ACCOUNT_NAME,account_key=None,sas_token=SAS_TOKEN)
# Lists All Blobs
l =[]
print("\nList blobs in the container")
generator = block_blob_service.list_blobs(CONTAINER_NAME)
for blob in generator:
print("a"+str(datetime.datetime.now()))
blobname=blob
l.append(blob.name)
print(l)
print("b"+str(datetime.datetime.now()))
OUTPUT: OUTPUT:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.