I am having a bit of a trouble with Synapse notebooks. I want to get a list of blob via pyspark script to dynamically decide which files I want to integrate. I cannot make this thing work in Synapse.. in other environment such as Jupyter notebook the code is working as expected.
from azure.storage.blob import ContainerClient, BlobServiceClient,AccountSasPermissions, ResourceTypes from azure.storage.blob._shared_access_signature import SharedAccessSignature,BlobSharedAccessSignature
sas_token = 'hardcoded_value'
account_url1 = 'https:// storage_account .blob.core.windows.net/ container ' + sas_token
print(account_url1) container_client = ContainerClient.from_container_url(container_url=account_url1) source_blob_list = container_client.list_blobs() for blob in source_blob_list: print (blob.name + '\n')
The output from the code above in Synapse is:
ServiceRequestError: <urllib3.connection.HTTPSConnection object at 0x7f282242e130>: Failed to establish a new connection: [Errno -2] Name or service not known
The output from the code above in Jupyter notebook is as expected..
I have Storage Blob Data Contributor assigned to my user and to Synapse user as well.
The above error mainly happens because of invalid syntax of URL.
Please follow below syntax. You will get list of blob files:
mssparkutils.fs.ls('wasbs://<container_name>@<Storage_account_name>.blob.core.windows.net/')
For more information refer this MS document
In the end was permissions to the managed identity of Synapse... The code above was working as I stated outside of Synapse. Now when We added permissions to the managed private endpoint of Synapse everything is working. Thank you!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.