简体   繁体   中英

Read Azure Datalake Gen2 images from Azure Databricks

Am working on .tif files stored in Azure Data Lake Gen2. Want to open this files using rasterio from Azure Databricks.

Example:

when reading the image file from Data Lake as spark.read.format("image").load(filepath) works fine.

在此处输入图片说明

But trying to open same as

with rasterio.open(filepath) as src:
    print(src.profile)

getting error:

RasterioIOError: wasbs://xxxxx.blob.core.windows.net/xxxx_2016/xxxx_2016.tif: No such file or directory

Any clues what am doing wrong?

Update:

As suggest by Axel R, mounted files on Databricks file system but still getting same issue and cannot open the file from rasterio, but can read as df.

在此处输入图片说明

Also tried by created shared access signature to the file in Datalake and tried to access the file through URI. Now getting error below error:

CURL error: error setting certificate verify locations:   CAfile: /etc/pki/tls/certs/ca-bundle.crt   CApath: none

To test further tried to open a sameple file from web which is @

filepath = 'http://landsat-pds.s3.amazonaws.com/c1/L8/042/034/LC08_L1TP_042034_20170616_20170629_01_T1/LC08_L1TP_042034_20170616_20170629_01_T1_B4.TIF' works fine

I believe it is because rasterio is using the Local APIs and can only read from a path that starts with /dbfs/.

Is it possible for you to mount the blob storage ? That would allow you to access it with rasterio with a path starting with /dbfs/mnt/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM