无法从 azure 数据块中的存储帐户读取容器内的增量镶木地板文件

Question

There is spark command which writes the output dataframe in delta format inside a container omega from python notebook有一个 spark 命令，它将 output dataframe 以增量格式写入 python 笔记本的容器 omega 中

when to try to read a delta file from this omega container using spark,it throw the below error何时尝试使用 spark 从这个 omega 容器中读取增量文件，它会抛出以下错误

omega_2022_06_06_path = 'dbfs:/mnt/omega/'  + 'part-00000-234567-c000.snappy.parquet'

omega_2022_06_07_path = 'dbfs:/mnt/omega/'  + 'part-00000-987898-c000.snappy.parquet'


omega_06_06_DF = spark.read.format("delta").load(omega_2022_06_06_path)
omega_06_07_DF = spark.read.format("delta").load(omega_2022_06_07_path)



 AnalysisException: A partition path fragment should be the form like `part1=foo/part2=bar`. The partition path:part-00000-234567-c000.snappy.parquet

I am not sure what is partition fragment here, This omega container simply has some delta files, basically there is no directory inside omega container我不确定这里的分区片段是什么，这个欧米茄容器只有一些增量文件，欧米茄容器内基本上没有目录

Can someone help me how do i resolve this issue有人可以帮我解决这个问题吗

Answer 1

If you need to read only specific files, then you need to read them using the parquet format, not delta .如果您只需要读取特定文件，则需要使用parquet格式而不是delta来读取它们。 The delta format represents a table as a whole (all data files and metadata), not specific pieces. delta格式表示整个表（所有数据文件和元数据），而不是特定部分。 If you need to extract specific data from Delta table usually you do spark.read.load and then use .filter to limit the scope to necessary data.如果您需要从 Delta 表中提取特定数据，通常您会执行spark.read.load ，然后使用.filter将 scope 限制为必要的数据。

无法从 azure 数据块中的存储帐户读取容器内的增量镶木地板文件

问题描述

1 个解决方案

解决方案1
0 2022-06-08 09:27:07

无法从 azure 数据块中的存储帐户读取容器内的增量镶木地板文件

问题描述

1 个解决方案

解决方案1 0 2022-06-08 09:27:07

解决方案1
0 2022-06-08 09:27:07