简体   繁体   English

Azure 数据块 - 无法使用来自数据湖存储 gen2 服务的 Spark 作业读取 .csv 文件

[英]Azure databricks - not able to read .csv files using spark jobs from datalake storage gen2 service

I have a databricks clister running fine.我有一个运行良好的 databricks 模块。 And using the following code I can mount my "datalake storage gen2" account as well.并使用以下代码我也可以挂载我的“datalake storage gen2”帐户。 I am mounting everything on /mnt/data1我正在 /mnt/data1 上安装所有内容

val configs =  Map("fs.azure.account.auth.type" -> "OAuth",
           "fs.azure.account.oauth.provider.type" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
           "fs.azure.account.oauth2.client.id" -> appID,
           "fs.azure.account.oauth2.client.secret" -> password,
           "fs.azure.account.oauth2.client.endpoint" -> ("https://login.microsoftonline.com/" + tenantID + "/oauth2/token"),
           "fs.azure.createRemoteFileSystemDuringInitialization"-> "true")
    
    dbutils.fs.mount(
    source = "abfss://" + fileSystemName + "@" + storageAccountName + ".dfs.core.windows.net/",
    mountPoint = "/mnt/data1",
    extraConfigs = configs)

Until this point everything is fine and working.到目前为止,一切都很好并且可以正常工作。 But when I'm trying to access one file from the mount location with the following command但是当我尝试使用以下命令从挂载位置访问一个文件时

val df = spark.read.csv("/mnt/data1/creodemocontainer/movies.csv")

I'm getting following error我收到以下错误

java.io.FileNotFoundException: dbfs:/mnt/data1/creodemocontainer2/movies.csv
    at com.databricks.backend.daemon.data.client.DatabricksFileSystemV2.$anonfun$getFileStatus$2(DatabricksFileSystemV2.scala:775)

Though I can connect and load those files in PowerBI without any issue.尽管我可以毫无问题地在 PowerBI 中连接和加载这些文件。 I'm not getting any clue from last 2 days So any help will be really appreciated.我没有从过去 2 天得到任何线索所以任何帮助将不胜感激。

thanks in advance.提前致谢。

Sharing the answer as per the comment by the original poster:根据原始海报的评论分享答案:

I'm not supposed to add container name while reading.我不应该在阅读时添加容器名称。

val df = spark.read.csv("/mnt/data1/creodemocontainer/movies.csv")

Removed container name, since it has already called in the mount point.删除了容器名称,因为它已经在挂载点中调用了。 Everything is working fine now现在一切正常

val df = spark.read.csv("/mnt/data1/movies.csv")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Azure Databrics - 从 Gen2 DataLake 存储运行 Spark Jar - Azure Databrics - Running a Spark Jar from Gen2 DataLake Storage azure datalake gen2 databricks ACL 权限 - azure datalake gen2 databricks ACLs permissions 在没有 Azure DataFactory 的情况下将文件和文件夹从 Azure DataLake Gen1 复制到 Azure DataLake Gen2 - Copy files and folders from Azure DataLake Gen1 to Azure DataLake Gen2 without Azure DataFactory Spark 可以写入 Azure Datalake Gen2 吗? - Can Spark write to Azure Datalake Gen2? SQL Polybase 可以从 Azure datalake gen2 读取数据吗? - Can SQL Polybase read data from Azure datalake gen2? 无法从 Azure DataBricks [wasbs vs abfss] 在 Storage Gen2 上创建挂载 - Not able to create mount on Storage Gen2 from Azure DataBricks [wasbs vs abfss] 将数据帧从 Azure Databricks 笔记本写入 Azure DataLake Gen2 表 - Write DataFrame from Azure Databricks notebook to Azure DataLake Gen2 Tables 创建范围以从 Databricks 访问 Azure Datalake Gen2 时出现属性错误 - Attribute error while creating scope to access Azure Datalake Gen2 from Databricks 使用 ADF 为内部部署和数据湖存储 gen2 构建 Azure 云解决方案作为源 - Build Azure cloud solution for onpremise and datalake storage gen2 as source using ADF Azure Databricks 通过服务主体访问 Azure Data Lake Storage Gen2 - Azure Databricks accessing Azure Data Lake Storage Gen2 via Service principal
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM