[英]How to list all files and subdirectories inside another Azure Data lake gen2 storage account which is in different subscription
I am trying to get all the files and their subdirectories from a container in Azure storage account in a different subscription and the business requirement is to use the abfss url.我正在尝试从不同订阅的 Azure 存储帐户中的容器中获取所有文件及其子目录,并且业务要求是使用 abfss url。 abfss://@.dfs.core.windows.net//.
abfss://@.dfs.core.windows.net//。 I tried to import spark config for the subscription and used the below code to return the file list.
我尝试为订阅导入 spark 配置,并使用以下代码返回文件列表。 Yet failed.
然而失败了。
import os
from fnmatch import fnmatch
root_list="abfss://staging@bdoibgedpsadlssandbox.dfs.core.windows.net/staging/"
files_list = []
pattern = "*.*"
print(pattern)
for path, subdirs, files in os.walk(root_list):
for name in files:
if fnmatch(name.upper(), pattern.upper()):
files_list.append(path+"/"+name)
this prints "[]" empty list.这将打印“[]”空列表。
You can use below code for this use case.您可以将以下代码用于此用例。
from pyspark.sql.functions import col
from azure.storage.blob import BlockBlobService
from datetime import datetime
import os.path
account_name='accountname'
container_name ='container_name'
second_conatainer_name ='data'
account_key = 'storage-account-key'
prefix_val = second_conatainer_name+'/'
block_blob_service = BlockBlobService(account_name='%s'%(account_name), account_key='%s'%(account_key))
#block_blob_service.create_container(container_name)
generator = block_blob_service.list_blobs(container_name,prefix="%s"%(prefix_val))
report_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
Target_file = "/target2/%s.csv" % (container_name)
print(Target_file)
Target_file = open("%s"%(Target_file), 'w')
for blob in generator:
length = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name)
last_modified = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.last_modified
file_size = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.content_length
blob_type = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.blob_type
creation_time = BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.blob_tier_change_time
if file_size != 0:
line = account_name+'|'+container_name+'|'+blob.name+'|'+ str(file_size) +'|'+str(last_modified)[:10]+'|'
print(line)
Target_file.write(line+'\n')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.