[英]Python - List all the files and blob inside an Azure Storage Container
This is my first post here on StackOverflow, hope it respects the guideline of this community.这是我在 StackOverflow 上的第一篇文章,希望它尊重这个社区的指导方针。
I'm trying to accomplish a simple task in Python because even though I'm really new to it, I found it very easy to use.我正在尝试在 Python 中完成一个简单的任务,因为尽管我对它真的很陌生,但我发现它非常容易使用。 I have a storage account on Azure, with a lot of containers inside.
我在Azure上有一个存储账户,里面有很多容器。 Each container contains some random files and/or blobs.
每个容器包含一些随机文件和/或 blob。
What I'm trying to do, is to get the name of all these files and/or blob and put it on a file.我想要做的是获取所有这些文件和/或 blob 的名称并将其放在一个文件中。
For now, I got here:现在,我到了这里:
import os, uuid
import sys
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
connection_string = "my_connection_string"
blob_svc = BlobServiceClient.from_connection_string(conn_str=connection_string)
try:
print("Azure Blob Storage v" + __version__ + " - Python quickstart sample")
print("\nListing blobs...")
containers = blob_svc.list_containers()
list_of_blobs = []
for c in containers:
container_client = blob_svc.get_container_client(c)
blob_list = container_client.list_blobs()
for blob in blob_list:
list_of_blobs.append(blob.name)
file_path = 'C:/my/path/to/file/randomfile.txt'
sys.stdout = open(file_path, "w")
print(list_of_blobs)
except Exception as ex:
print('Exception:')
print(ex)
But I'm having 3 problems:但我有 3 个问题:
I'm getting the <name_of_ the_blob>/<name_of_the_file_inside>: I would like to have just the name of the file inside the blob我正在获取 <name_of_ the_blob>/<name_of_the_file_inside>:我只想在 blob 中包含文件的名称
If in a container there is a blob (or more than 1 blob) + a random file, this script prints only the name of the blob + the name of the file inside , skipping the other files outside the blobs.如果容器中有一个 blob(或超过 1 个 blob)+ 一个随机文件,则此脚本仅打印 blob 的名称 +内部文件的名称,跳过 blob 之外的其他文件。
I would like to put all the names of the blobs/files in a.csv file.我想将所有 blob/文件的名称放在 a.csv 文件中。
But I'm not sure how to do point 3, and how to resolve points 1 and 2.但我不确定如何做第 3 点,以及如何解决第 1 点和第 2 点。
Cloud some maybe help on this? Cloud some 可能对此有帮助吗?
Thanks!谢谢!
Edit:编辑:
I'm adding an image here just to clarify a little what I mean when I talk about blob/files我在这里添加一张图片只是为了澄清我在谈论 blob/文件时的意思
Just to clarify that there are no 2 things such as files or blobs in the Blob Storage the files inside Blob Storage are called blobs.只是为了澄清 Blob 存储中没有文件或 blob 等 2 种东西,Blob 存储中的文件称为 blob。 Below is the hierarchy that you can observe in blob storage.
下面是您可以在 Blob 存储中观察到的层次结构。
Blob Storage > Containers > Directories/Virtual Folders > Blobs Blob 存储 > 容器 > 目录/虚拟文件夹 > Blob
I'm getting the <name_of_ the_blob>/<name_of_the_file_inside>: I would like to have just the name of the file inside the blob
我正在获取 <name_of_ the_blob>/<name_of_the_file_inside>:我只想在 blob 中包含文件的名称
for this, you can iterate through your container using list_blobs(<Container_Name>)
taking only the names of the blobs ie, blob.name.为此,您可以使用
list_blobs(<Container_Name>)
遍历您的容器,只获取 blob 的名称,即 blob.name。 Here is how the code goes when you are trying to list all the blobs names inside a container.以下是当您尝试列出容器内的所有 blob 名称时代码的运行方式。
generator = blob_service.list_blobs(CONTAINER_NAME)
for blob in generator:
print("\t Blob name: "+c.name+'/'+ blob.name)
If in a container there is a blob (or more than 1 blob) + a random file, this script prints only the name of the blob + the name of the file inside, skipping the other files outside the blobs.
如果容器中有一个 blob(或超过 1 个 blob)+ 一个随机文件,则此脚本仅打印 blob 的名称 + 内部文件的名称,跳过 blob 之外的其他文件。
you can use iterate for containers using list_containers()
and then use list_blobs(<Container_Name>)
for iterating over the blob names and then finally write the blob names to a local file.您可以使用
list_containers()
对容器进行迭代,然后使用list_blobs(<Container_Name>)
迭代 blob 名称,最后将 blob 名称写入本地文件。
I would like to put all the names of the blobs/files in a.csv file.
我想将所有 blob/文件的名称放在 a.csv 文件中。
A simple with open('<filename>.csv', 'w') as f write
.一个简单
with open('<filename>.csv', 'w') as f write
。 Below is the sample code下面是示例代码
with open('BlobsNames.csv', 'w') as f:
f.write(<statements>)
Here is the complete sample code that worked for us where each blob from every folder will be listed.这是为我们工作的完整示例代码,其中将列出每个文件夹中的每个 blob。
import os
from azure.storage.blob import BlockBlobService
ACCOUNT_NAME = "<ACCOUNT_NAME>"
SAS_TOKEN='<YOUR_SAS_TOKEN>'
blob_service = BlockBlobService(account_name=ACCOUNT_NAME,account_key=None,sas_token=SAS_TOKEN)
print("\nList blobs in the container")
with open('BlobsNames.txt', 'w') as f:
containers = blob_service.list_containers()
for c in containers:
generator = blob_service.list_blobs(c.name)
for blob in generator:
print("\t Blob name: "+c.name+'/'+ blob.name)
f.write(c.name+'/'+blob.name)
f.write('\n')
This works even when there are folders in containers.即使容器中有文件夹,这也适用。
RESULT:结果:
NOTE: You can just remove c.name
while printing the blob to file if your requirement is to just pull out the blob names.注意:如果您的要求只是提取 blob 名称,则可以在将 blob 打印到文件时删除
c.name
。
Thanks all for your reply,谢谢大家的回复,
in the end, I took what SwethaKandikonda-MT wrote, and I change it a little bit to fit the connection problem that I had.最后,我采用了 SwethaKandikonda-MT 编写的内容,并对其进行了一些更改以适应我遇到的连接问题。
Here is what I came up:这是我想出的:
import os, uuid
import sys
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
import csv
connection_string = "my_account_storage_connection_string"
blob_svc = BlobServiceClient.from_connection_string(conn_str=connection_string)
list_of_blobs = []
print("\nList blobs in the container")
with open('My_path/to/the/file.csv', 'w') as f:
containers = blob_svc.list_containers()
for c in containers:
container_client = blob_svc.get_container_client(c.name)
blob_list = container_client.list_blobs()
for blob in blob_list:
print("\t Blob name: "+c.name +'/'+ blob.name) #this will print on the console
f.write('/'+blob.name) #this will write on the csv file just the blob name
f.write('\n')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.