简体   繁体   English

如何列出另一个订阅中的另一个 Azure 数据湖 gen2 存储帐户中的所有文件和子目录

[英]How to list all files and subdirectories inside another Azure Data lake gen2 storage account which is in different subscription

I am trying to get all the files and their subdirectories from a container in Azure storage account in a different subscription and the business requirement is to use the abfss url.我正在尝试从不同订阅的 Azure 存储帐户中的容器中获取所有文件及其子目录,并且业务要求是使用 abfss url。 abfss://@.dfs.core.windows.net//. abfss://@.dfs.core.windows.net//。 I tried to import spark config for the subscription and used the below code to return the file list.我尝试为订阅导入 spark 配置,并使用以下代码返回文件列表。 Yet failed.然而失败了。

import os
from fnmatch import fnmatch
root_list="abfss://staging@bdoibgedpsadlssandbox.dfs.core.windows.net/staging/"
files_list = []
pattern = "*.*"
print(pattern)
for path, subdirs, files in os.walk(root_list):
  for name in files:
    if fnmatch(name.upper(), pattern.upper()):
      files_list.append(path+"/"+name)

this prints "[]" empty list.这将打印“[]”空列表。

You can use below code for this use case.您可以将以下代码用于此用例。

from pyspark.sql.functions import col
from azure.storage.blob import BlockBlobService
from datetime import datetime
import os.path

account_name='accountname'
container_name ='container_name'
second_conatainer_name ='data'
account_key = 'storage-account-key'
prefix_val = second_conatainer_name+'/'

block_blob_service = BlockBlobService(account_name='%s'%(account_name), account_key='%s'%(account_key))

#block_blob_service.create_container(container_name)
generator = block_blob_service.list_blobs(container_name,prefix="%s"%(prefix_val))
report_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S')


Target_file = "/target2/%s.csv" % (container_name)
print(Target_file)

Target_file = open("%s"%(Target_file), 'w')

for blob in generator:
    length = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name)
    last_modified = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.last_modified
    file_size = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.content_length
    blob_type = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.blob_type
    creation_time = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.blob_tier_change_time
    if file_size != 0:
       line = account_name+'|'+container_name+'|'+blob.name+'|'+ str(file_size) +'|'+str(last_modified)[:10]+'|'
       print(line)
       Target_file.write(line+'\n')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Azure Data Lake Storage Gen2 创建目录(如果 python 中不存在) - Azure Data Lake Storage Gen2 create directory if not exists in python 通过 python 检查 azure 数据湖存储 gen2 中是否存在文件 - check a file exists in azure data lake storage gen2 via python 如何从 Azure Data Lake Gen2 访问 XML 文件并将其转换为 Azure Databricks 中的数据帧? - How to access XML file from Azure Data Lake Gen2 and transform it into data-frame in Azure Databricks? 如何在 python 上使用 presto 连接到 Azure 数据湖存储? - How to connect to Azure data lake storage using presto on python? 对于 Python 3.8 Azure 数据湖 Gen 2,如何检查文件系统上是否存在文件? - For Python 3.8 Azure data lake Gen 2, how do I check if a file exists on a filesystem? 如何使用 python 从 Azure Data Lake Gen 2 读取文件 - How can i read a file from Azure Data Lake Gen 2 using python 从存储帐户(Azure Data lake)读取 pdf 文件,无需使用 python 下载 - Read pdf file from storage account (Azure Data lake) without downloading it using python 如何将 Data Lake Gen 1 中的数据连接到 Tableau - How to connect data from Data Lake Gen 1 to Tableau 从 Azure Databricks 读取 Azure Datalake Gen2 映像 - Read Azure Datalake Gen2 images from Azure Databricks .jpg 文件未从 blob 存储(Azure 数据湖)加载到数据块中 - .jpg file not loading in databricks from blob storage (Azure data lake)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM