简体   繁体   中英

Cant list blobs from Synapse notebook

I am having a bit of a trouble with Synapse notebooks. I want to get a list of blob via pyspark script to dynamically decide which files I want to integrate. I cannot make this thing work in Synapse.. in other environment such as Jupyter notebook the code is working as expected.

from azure.storage.blob import ContainerClient, BlobServiceClient,AccountSasPermissions, ResourceTypes from azure.storage.blob._shared_access_signature import SharedAccessSignature,BlobSharedAccessSignature

sas_token = 'hardcoded_value'

account_url1 = 'https:// storage_account .blob.core.windows.net/ container ' + sas_token

print(account_url1) container_client = ContainerClient.from_container_url(container_url=account_url1) source_blob_list = container_client.list_blobs() for blob in source_blob_list: print (blob.name + '\n')

The output from the code above in Synapse is:

ServiceRequestError: <urllib3.connection.HTTPSConnection object at 0x7f282242e130>: Failed to establish a new connection: [Errno -2] Name or service not known

The output from the code above in Jupyter notebook is as expected..

snip

I have Storage Blob Data Contributor assigned to my user and to Synapse user as well.

The above error mainly happens because of invalid syntax of URL.

Please follow below syntax. You will get list of blob files:

mssparkutils.fs.ls('wasbs://<container_name>@<Storage_account_name>.blob.core.windows.net/')

参考1

For more information refer this MS document

In the end was permissions to the managed identity of Synapse... The code above was working as I stated outside of Synapse. Now when We added permissions to the managed private endpoint of Synapse everything is working. Thank you!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM