Not able to read delta parquet files inside a container from storage account in azure databricks

Question

There is spark command which writes the output dataframe in delta format inside a container omega from python notebook

when to try to read a delta file from this omega container using spark,it throw the below error

omega_2022_06_06_path = 'dbfs:/mnt/omega/'  + 'part-00000-234567-c000.snappy.parquet'

omega_2022_06_07_path = 'dbfs:/mnt/omega/'  + 'part-00000-987898-c000.snappy.parquet'


omega_06_06_DF = spark.read.format("delta").load(omega_2022_06_06_path)
omega_06_07_DF = spark.read.format("delta").load(omega_2022_06_07_path)



 AnalysisException: A partition path fragment should be the form like `part1=foo/part2=bar`. The partition path:part-00000-234567-c000.snappy.parquet

I am not sure what is partition fragment here, This omega container simply has some delta files, basically there is no directory inside omega container

Can someone help me how do i resolve this issue

Answer 1

If you need to read only specific files, then you need to read them using the parquet format, not delta . The delta format represents a table as a whole (all data files and metadata), not specific pieces. If you need to extract specific data from Delta table usually you do spark.read.load and then use .filter to limit the scope to necessary data.

Not able to read delta parquet files inside a container from storage account in azure databricks

Question

1 answers

solution1
0 2022-06-08 09:27:07

Not able to read delta parquet files inside a container from storage account in azure databricks

Question

1 answers

solution1 0 2022-06-08 09:27:07

solution1
0 2022-06-08 09:27:07