简体   繁体   English

无法从 azure 数据块中的存储帐户读取容器内的增量镶木地板文件

[英]Not able to read delta parquet files inside a container from storage account in azure databricks

There is spark command which writes the output dataframe in delta format inside a container omega from python notebook有一个 spark 命令,它将 output dataframe 以增量格式写入 python 笔记本的容器 omega 中

when to try to read a delta file from this omega container using spark,it throw the below error何时尝试使用 spark 从这个 omega 容器中读取增量文件,它会抛出以下错误

omega_2022_06_06_path = 'dbfs:/mnt/omega/'  + 'part-00000-234567-c000.snappy.parquet'

omega_2022_06_07_path = 'dbfs:/mnt/omega/'  + 'part-00000-987898-c000.snappy.parquet'


omega_06_06_DF = spark.read.format("delta").load(omega_2022_06_06_path)
omega_06_07_DF = spark.read.format("delta").load(omega_2022_06_07_path)



 AnalysisException: A partition path fragment should be the form like `part1=foo/part2=bar`. The partition path:part-00000-234567-c000.snappy.parquet

I am not sure what is partition fragment here, This omega container simply has some delta files, basically there is no directory inside omega container我不确定这里的分区片段是什么,这个欧米茄容器只有一些增量文件,欧米茄容器内基本上没有目录

Can someone help me how do i resolve this issue有人可以帮我解决这个问题吗

If you need to read only specific files, then you need to read them using the parquet format, not delta .如果您只需要读取特定文件,则需要使用parquet格式而不是delta来读取它们。 The delta format represents a table as a whole (all data files and metadata), not specific pieces. delta格式表示整个表(所有数据文件和元数据),而不是特定部分。 If you need to extract specific data from Delta table usually you do spark.read.load and then use .filter to limit the scope to necessary data.如果您需要从 Delta 表中提取特定数据,通常您会执行spark.read.load ,然后使用.filter将 scope 限制为必要的数据。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Azure 数据工厂 - 如何只读取由 Databricks 构建的 Delta 格式 Parquet 中的最新数据集? - Azure Data Factory - How to read only the latest dataset in a Delta format Parquet built from Databricks? 无法从 Databricks 笔记本的 Azure 存储容器中删除目录 - Not able to delete directory from Azure Storage container by Databricks notebook Azure 数据块 - 无法使用来自数据湖存储 gen2 服务的 Spark 作业读取 .csv 文件 - Azure databricks - not able to read .csv files using spark jobs from datalake storage gen2 service 从存储帐户创建 Azure databricks 笔记本 - Create Azure databricks notebook from storage account 无法从 Azure 数据工厂读取 Parquet 文件(从 Databricks 作业/笔记本创建) - Unable to read Parquet files(Created from Databricks Job/Notebook) from Azure Data Factory 通过databricks列出azure的存储帐户内的所有容器 - List down all container within a storage account of azure through databricks 如何从Blob存储容器中导入和处理所有文件以使数据砖蔚蓝 - How to import and process all files from a blob storage container to azure databricks 使用 Azure 数据块从 FileServer 上传到 Azure 存储容器 - Upload to Azure Storage container from FileServer using Azure databricks 如何从 Azure Functions 中的存储容器读取多个文件 - How to read multiple files from a storage container in Azure Functions Azure 存储帐户容器 - Azure storage account container
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM