[英]Not able to read delta parquet files inside a container from storage account in azure databricks
There is spark command which writes the output dataframe in delta format inside a container omega from python notebook有一个 spark 命令,它将 output dataframe 以增量格式写入 python 笔记本的容器 omega 中
when to try to read a delta file from this omega container using spark,it throw the below error何时尝试使用 spark 从这个 omega 容器中读取增量文件,它会抛出以下错误
omega_2022_06_06_path = 'dbfs:/mnt/omega/' + 'part-00000-234567-c000.snappy.parquet'
omega_2022_06_07_path = 'dbfs:/mnt/omega/' + 'part-00000-987898-c000.snappy.parquet'
omega_06_06_DF = spark.read.format("delta").load(omega_2022_06_06_path)
omega_06_07_DF = spark.read.format("delta").load(omega_2022_06_07_path)
AnalysisException: A partition path fragment should be the form like `part1=foo/part2=bar`. The partition path:part-00000-234567-c000.snappy.parquet
I am not sure what is partition fragment here, This omega container simply has some delta files, basically there is no directory inside omega container我不确定这里的分区片段是什么,这个欧米茄容器只有一些增量文件,欧米茄容器内基本上没有目录
Can someone help me how do i resolve this issue有人可以帮我解决这个问题吗
If you need to read only specific files, then you need to read them using the parquet
format, not delta
.如果您只需要读取特定文件,则需要使用parquet
格式而不是delta
来读取它们。 The delta
format represents a table as a whole (all data files and metadata), not specific pieces. delta
格式表示整个表(所有数据文件和元数据),而不是特定部分。 If you need to extract specific data from Delta table usually you do spark.read.load
and then use .filter
to limit the scope to necessary data.如果您需要从 Delta 表中提取特定数据,通常您会执行spark.read.load
,然后使用.filter
将 scope 限制为必要的数据。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.