简体   繁体   English

从 Azure Databricks 读取 Azure Datalake Gen2 映像

[英]Read Azure Datalake Gen2 images from Azure Databricks

Am working on .tif files stored in Azure Data Lake Gen2.我正在处理存储在 Azure Data Lake Gen2 中的.tif文件。 Want to open this files using rasterio from Azure Databricks.要打开使用该文件rasterio从Azure的Databricks。

Example:例子:

when reading the image file from Data Lake as spark.read.format("image").load(filepath) works fine.当从数据湖读取图像文件时,作为spark.read.format("image").load(filepath)工作正常。

在此处输入图片说明

But trying to open same as但试图打开相同

with rasterio.open(filepath) as src:
    print(src.profile)

getting error:得到错误:

RasterioIOError: wasbs://xxxxx.blob.core.windows.net/xxxx_2016/xxxx_2016.tif: No such file or directory

Any clues what am doing wrong?任何线索做错了什么?

Update:更新:

As suggest by Axel R, mounted files on Databricks file system but still getting same issue and cannot open the file from rasterio, but can read as df.正如 Axel R 所建议的,在 Databricks 文件系统上安装了文件,但仍然遇到同样的问题,无法从 rasterio 打开文件,但可以读取为 df。

在此处输入图片说明

Also tried by created shared access signature to the file in Datalake and tried to access the file through URI.还尝试通过为 Datalake 中的文件创建共享访问签名并尝试通过 URI 访问文件。 Now getting error below error:现在收到错误以下错误:

CURL error: error setting certificate verify locations:   CAfile: /etc/pki/tls/certs/ca-bundle.crt   CApath: none

To test further tried to open a sameple file from web which is @为了进一步测试尝试从网络打开一个相同的文件@

filepath = 'http://landsat-pds.s3.amazonaws.com/c1/L8/042/034/LC08_L1TP_042034_20170616_20170629_01_T1/LC08_L1TP_042034_20170616_20170629_01_T1_B4.TIF' works fine filepath = 'http://landsat-pds.s3.amazonaws.com/c1/L8/042/034/LC08_L1TP_042034_20170616_20170629_01_T1/LC08_L1TP_042034_20170616_20170629_01_T1_B4.TIF'很好用。

I believe it is because rasterio is using the Local APIs and can only read from a path that starts with /dbfs/.我相信这是因为 rasterio 使用的是本地 API,并且只能从以 /dbfs/ 开头的路径读取。

Is it possible for you to mount the blob storage ?您是否可以挂载 blob 存储? That would allow you to access it with rasterio with a path starting with /dbfs/mnt/这将允许您使用 rasterio 访问它,路径以 /dbfs/mnt/ 开头

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 Azure Data Lake Gen2 访问 XML 文件并将其转换为 Azure Databricks 中的数据帧? - How to access XML file from Azure Data Lake Gen2 and transform it into data-frame in Azure Databricks? 将附加文本文件从数据块写入 azure adls gen1 - writing appending text file from databricks to azure adls gen1 Azure Databricks:将文件保存在 Azure Datalake 目录文件夹中 - Azure Databricks : Save file in Azure Datalake directory folder Azure Data Lake Storage Gen2 创建目录(如果 python 中不存在) - Azure Data Lake Storage Gen2 create directory if not exists in python 通过 python 检查 azure 数据湖存储 gen2 中是否存在文件 - check a file exists in azure data lake storage gen2 via python 如何在 Databricks 中使用 os.walk() 计算 Azure datalake 中的目录大小 - How to use os.walk() in Databricks to calculate directory size in Azure datalake 如何列出另一个订阅中的另一个 Azure 数据湖 gen2 存储帐户中的所有文件和子目录 - How to list all files and subdirectories inside another Azure Data lake gen2 storage account which is in different subscription 如何使用 python 从 Azure Data Lake Gen 2 读取文件 - How can i read a file from Azure Data Lake Gen 2 using python 从 Azure Databricks 读取 Excel 文件 - Reading Excel file from Azure Databricks 从 Azure Databricks 笔记本登录到 Azure ML 工作区 - Login to Azure ML workspace from Azure Databricks notebook
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM