简体   繁体   English

Python - 列出 Azure 存储容器中的所有文件和 blob

[英]Python - List all the files and blob inside an Azure Storage Container

This is my first post here on StackOverflow, hope it respects the guideline of this community.这是我在 StackOverflow 上的第一篇文章,希望它尊重这个社区的指导方针。

I'm trying to accomplish a simple task in Python because even though I'm really new to it, I found it very easy to use.我正在尝试在 Python 中完成一个简单的任务,因为尽管我对它真的很陌生,但我发现它非常容易使用。 I have a storage account on Azure, with a lot of containers inside.我在Azure上有一个存储账户,里面有很多容器。 Each container contains some random files and/or blobs.每个容器包含一些随机文件和/或 blob。

What I'm trying to do, is to get the name of all these files and/or blob and put it on a file.我想要做的是获取所有这些文件和/或 blob 的名称并将其放在一个文件中。

For now, I got here:现在,我到了这里:

import os, uuid
import sys
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
connection_string = "my_connection_string"
blob_svc = BlobServiceClient.from_connection_string(conn_str=connection_string)


try:

    print("Azure Blob Storage v" + __version__ + " - Python quickstart sample")
    print("\nListing blobs...")
    containers = blob_svc.list_containers()
    list_of_blobs = []


    for c in containers:
      container_client = blob_svc.get_container_client(c)
      blob_list = container_client.list_blobs()
      for blob in blob_list:
        list_of_blobs.append(blob.name)
      file_path = 'C:/my/path/to/file/randomfile.txt'
      sys.stdout = open(file_path, "w")
      print(list_of_blobs)

except Exception as ex:
    print('Exception:')
    print(ex) 

But I'm having 3 problems:但我有 3 个问题:

  1. I'm getting the <name_of_ the_blob>/<name_of_the_file_inside>: I would like to have just the name of the file inside the blob我正在获取 <name_of_ the_blob>/<name_of_the_file_inside>:我只想在 blob 中包含文件的名称

  2. If in a container there is a blob (or more than 1 blob) + a random file, this script prints only the name of the blob + the name of the file inside , skipping the other files outside the blobs.如果容器中有一个 blob(或超过 1 个 blob)+ 一个随机文件,则此脚本仅打印 blob 的名称 +内部文件的名称,跳过 blob 之外的其他文件。

  3. I would like to put all the names of the blobs/files in a.csv file.我想将所有 blob/文件的名称放在 a.csv 文件中。

But I'm not sure how to do point 3, and how to resolve points 1 and 2.但我不确定如何做第 3 点,以及如何解决第 1 点和第 2 点。

Cloud some maybe help on this? Cloud some 可能对此有帮助吗?

Thanks!谢谢!

Edit:编辑:

I'm adding an image here just to clarify a little what I mean when I talk about blob/files我在这里添加一张图片只是为了澄清我在谈论 blob/文件时的意思

Azure 存储帐户内的容器示例

Just to clarify that there are no 2 things such as files or blobs in the Blob Storage the files inside Blob Storage are called blobs.只是为了澄清 Blob 存储中没有文件或 blob 等 2 种东西,Blob 存储中的文件称为 blob。 Below is the hierarchy that you can observe in blob storage.下面是您可以在 Blob 存储中观察到的层次结构。

Blob Storage > Containers > Directories/Virtual Folders > Blobs Blob 存储 > 容器 > 目录/虚拟文件夹 > Blob

I'm getting the <name_of_ the_blob>/<name_of_the_file_inside>: I would like to have just the name of the file inside the blob我正在获取 <name_of_ the_blob>/<name_of_the_file_inside>:我只想在 blob 中包含文件的名称

for this, you can iterate through your container using list_blobs(<Container_Name>) taking only the names of the blobs ie, blob.name.为此,您可以使用list_blobs(<Container_Name>)遍历您的容器,只获取 blob 的名称,即 blob.name。 Here is how the code goes when you are trying to list all the blobs names inside a container.以下是当您尝试列出容器内的所有 blob 名称时代码的运行方式。

generator = blob_service.list_blobs(CONTAINER_NAME)
for blob in generator:
    print("\t Blob name: "+c.name+'/'+  blob.name)

If in a container there is a blob (or more than 1 blob) + a random file, this script prints only the name of the blob + the name of the file inside, skipping the other files outside the blobs.如果容器中有一个 blob(或超过 1 个 blob)+ 一个随机文件,则此脚本仅打印 blob 的名称 + 内部文件的名称,跳过 blob 之外的其他文件。

you can use iterate for containers using list_containers() and then use list_blobs(<Container_Name>) for iterating over the blob names and then finally write the blob names to a local file.您可以使用list_containers()对容器进行迭代,然后使用list_blobs(<Container_Name>)迭代 blob 名称,最后将 blob 名称写入本地文件。

I would like to put all the names of the blobs/files in a.csv file.我想将所有 blob/文件的名称放在 a.csv 文件中。

A simple with open('<filename>.csv', 'w') as f write .一个简单with open('<filename>.csv', 'w') as f write Below is the sample code下面是示例代码

with open('BlobsNames.csv', 'w') as f:
     f.write(<statements>)

Here is the complete sample code that worked for us where each blob from every folder will be listed.这是为我们工作的完整示例代码,其中将列出每个文件夹中的每个 blob。

import os
from azure.storage.blob import BlockBlobService

ACCOUNT_NAME = "<ACCOUNT_NAME>"
SAS_TOKEN='<YOUR_SAS_TOKEN>'

blob_service = BlockBlobService(account_name=ACCOUNT_NAME,account_key=None,sas_token=SAS_TOKEN)

print("\nList blobs in the container")
with open('BlobsNames.txt', 'w') as f:
    containers = blob_service.list_containers()
    for c in containers:
        generator = blob_service.list_blobs(c.name)
        for blob in generator:
            print("\t Blob name: "+c.name+'/'+  blob.name)
            f.write(c.name+'/'+blob.name)
            f.write('\n')    

This works even when there are folders in containers.即使容器中有文件夹,这也适用。

RESULT:结果:

在此处输入图像描述

NOTE: You can just remove c.name while printing the blob to file if your requirement is to just pull out the blob names.注意:如果您的要求只是提取 blob 名称,则可以在将 blob 打印到文件时删除c.name

Thanks all for your reply,谢谢大家的回复,

in the end, I took what SwethaKandikonda-MT wrote, and I change it a little bit to fit the connection problem that I had.最后,我采用了 SwethaKandikonda-MT 编写的内容,并对其进行了一些更改以适应我遇到的连接问题。

Here is what I came up:这是我想出的:


import os, uuid
import sys
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
import csv


connection_string = "my_account_storage_connection_string"
blob_svc = BlobServiceClient.from_connection_string(conn_str=connection_string)
list_of_blobs = []


print("\nList blobs in the container")
with open('My_path/to/the/file.csv', 'w') as f:
    
    containers = blob_svc.list_containers()
    
    for c in containers:
        container_client = blob_svc.get_container_client(c.name)
        blob_list = container_client.list_blobs()
        for blob in blob_list:
            print("\t Blob name: "+c.name +'/'+  blob.name) #this will print on the console
            f.write('/'+blob.name) #this will write on the csv file just the blob name
            f.write('\n')   

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 是否可以从 Azure Blob Storage 容器中读取所有文件,并在使用 Python 读取后删除文件? - Is it possible to read in all the files from an Azure Blob Storage container, and deleting the files after reading with Python? 如何使用 Python 从给定 SAS URI 和容器名称的 Azure Blob 存储下载文件列表? - How to download a list of files from Azure Blob Storage given SAS URI and container name using Python? 使用 Azure-Storage-Blob Python 读取 Blob 容器目录中每个 Blob 的文件大小 - Reading the File size for each blob inside a directory of a Blob Container using Azure-Storage-Blob Python 数据砖列出Azure Blob存储中的所有Blob - Databricks list all blobs in Azure Blob Storage Azure Blob 存储下载 Python 中的 ORC 文件 - Azure Blob Storage downloading ORC files in Python 上传文件和简历到 azure blob 存储 python - Upload files and resume in azure blob storage python 如何使用 python 有效地列出 Azure blob 中的所有文件? - How to efficiently list all files in an Azure blob using python? 如何使用 Python 和 Azure 函数在 Azure 存储容器中创建 blob - How to create a blob in an Azure Storage Container using Python & Azure Functions Azure python存储块blob存储正在吃掉所有内存 - Azure python storage block blob storage is eating all the memory up 在Python中的Azure Blob存储容器上应用SAS权限 - Apply SAS Permissions on Azure Blob Storage Container in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM