简体   繁体   English

Azure Databricks在Blob存储上打开文件的问题

[英]Problems with Azure Databricks opening a file on the Blob Storage

With Azure Databricks i'm able to list the files in the blob storage, get them in a array. 使用Azure Databricks,我可以列出blob存储中的文件,并将它们放在一个数组中。 But when I try to open one f the file i'm getting a error. 但是,当我尝试打开一个文件时,出现错误。 Probably due to the special syntax. 可能是由于特殊的语法。

storage_account_name = "tesb"
storage_container_name = "rttracking-in"
storage_account_access_key = "xyz"
file_location = "wasbs://rttracking-in"
file_type = "xml"

spark.conf.set(
  "fs.azure.account.key."+storage_account_name+".blob.core.windows.net",
  storage_account_access_key)

xmlfiles = dbutils.fs.ls("wasbs://"+storage_container_name+"@"+storage_account_name+".blob.core.windows.net/")

import pandas as pd
import xml.etree.ElementTree as ET
import re
import os

firstfile = xmlfiles[0].path
root = ET.parse(firstfile).getroot()

The error is 错误是

IOError: [Errno 2] No such file or directory: u'wasbs://rttracking-in@tstoweuyptoesb.blob.core.windows.net/rtTracking_00001.xml' IOError:[Errno 2]没有这样的文件或目录:u'wasbs://rttracking-in@tstoweuyptoesb.blob.core.windows.net/rtTracking_00001.xml'

My guess is that ET.parse() does not know the Spark context in which you have set up the connection to the Storage Account. 我的猜测是ET.parse()不知道您在其中建立与存储帐户的连接的Spark上下文。 Alternatively you can try to mount the storage. 或者,您可以尝试安装存储。 Then you can access files through native paths as if the files were local. 然后,您可以通过本机路径访问文件,就像文件是本地文件一样。

See here: https://docs.databricks.com/spark/latest/data-sources/azure/azure-storage.html#mount-an-azure-blob-storage-container 参见此处: https : //docs.databricks.com/spark/latest/data-sources/azure/azure-storage.html#mount-an-azure-blob-storage-container

This should work then: 然后应该可以工作:

root = ET.parse("/mnt/<mount-name>/...")

I did mount the Storage and then this does the trick 我确实安装了存储,然后就可以了

firstfile = xmlfiles[0].path.replace('dbfs:','/dbfs') root = ET.parse(firstfile).getroot() firstfile = xmlfiles [0] .path.replace('dbfs:','/ dbfs')root = ET.parse(firstfile).getroot()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将 DataBrick 连接到 Azure Blob 存储 - Connecting DataBricks to Azure Blob Storage Azure Databricks 装载 Blob 存储 - Azure Databricks mounting a blob storage 使用 Databricks 将数据帧写入 Azure Blob 存储会生成一个空文件 - Writing dataframe to Azure Blob Storage using Databricks generates an empty file 如何将 a.hyper 文件(在 DataBricks 中)写入 Blob 存储(在 Azure 中)? - How to Write a .hyper file (in DataBricks) to Blob Storage (in Azure)? 无法再将文件从 Databricks 保存到 Azure Blob 存储 - Can't save file from Databricks to Azure Blob Storage Anymore .jpg 文件未从 blob 存储(Azure 数据湖)加载到数据块中 - .jpg file not loading in databricks from blob storage (Azure data lake) 数据砖列出Azure Blob存储中的所有Blob - Databricks list all blobs in Azure Blob Storage Azure Databricks:访问防火墙后面的 Blob 存储 - Azure Databricks: Accessing Blob Storage Behind Firewall 如何从 Azure Databricks Notebook 直接读取 Azure Blob 存储文件 - How can I read an Azure Blob Storage file direclty from an Azure Databricks Notebook 如何将文件从 blob 存储读取到 azure 数据块,文件名中包含每日日期 - How to read a file from blob storage to azure databricks with daily date in the file name
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM