简体   繁体   English

如何读取 Azure Databricks 中的镶木地板文件?

[英]How to read a parquet file in Azure Databricks?

I have few parquet files stored in my storage account, which I am trying to read using the below code.我的存储帐户中存储的镶木地板文件很少,我正尝试使用以下代码读取这些文件。 However it fails with error as incorrect syntax.但是,由于语法不正确,它失败并出现错误。 Can someone suggest to me as whats the correct way to read parquet files using azure databricks?有人可以向我建议使用 azure 数据块读取镶木地板文件的正确方法是什么吗?

val data = spark.read.parquet("abfss://containername@storagename.dfs.core.windows.net/TestFolder/XYZ/part-00000-1cf0cf7b-6c9f-41-a268-be-c000.snappy.parquet")
display(data)

abfss://containername@storagename.dfs.core.windows.net/TestFolder/XYZ/part-00000-1cf0cf7b-6c9f-41-a268-be-c000.snappy.parquet abfss://containername@storagename.dfs.core.windows.net/TestFolder/XYZ/part-00000-1cf0cf7b-6c9f-41-a268-be-c000.snappy.parquet

As per the above abfss URL you can use delta or parquet format in the storage account.根据上面的 abfss URL,您可以在存储帐户中使用 delta 或 parquet 格式。

Note: If you created delta table, part file creates automatically like this part-00000-1cf0cf7b-6c9f-41-a268-be-c000.snappy.parquet .As per above code it is not possible to read parquet file in delta format.注意:如果您创建了增量表,零件文件会像这样自动创建part-00000-1cf0cf7b-6c9f-41-a268-be-c000.snappy.parquet 。按照上面的代码,无法读取增量格式的镶木地板文件。

I have written the datafram df1 and overwrite into a storage account with parquet format.我已经编写了数据帧df1并覆盖到镶木地板格式的存储帐户中。

df1.coalesce(1).write.format('parquet').mode("overwrite").save("abfss://<container>@<stoarge_account>.dfs.core.windows.net/<folder>/<sub_folder>")

Scala Scala

val df11 = spark.read.format("parquet").load("abfss://<container>@<stoarge_account>.dfs.core.windows.net/demo/d121/part-00000-tid-2397072542034942773-def47888-c000.snappy.parquet")
display(df11)

python python

df11 = spark.read.format("parquet").load("abfss://<container>@<stoarge_account>.dfs.core.windows.net/demo/d121/part-00000-tid-2397072542034942773-def47888-c000.snappy.parquet")
display(df11)

Output: Output:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Azure Databricks - 将 Parquet 文件写入策划区 - Azure Databricks - Write Parquet file to Curated Zone Azure Databricks 笔记本中的 Pandas 缺少 read_parquet function - Pandas missing read_parquet function in Azure Databricks Notebook 如何从 Azure blob 存储中将镶木地板文件读入 pandas - How to read parquet file into pandas from Azure blob store 无法读取Azure Databricks上的.xlsx文件 - Can't read .xlsx file on Azure Databricks 如何从Azure Data Lake Store中读取Azure Databricks中的JSON文件 - How to read a JSON file in Azure Databricks from Azure Data Lake Store Azure Databricks - 将 Parquet 文件读入 DataFrames - Azure Databricks - Reading Parquet files into DataFrames 在 azure 数据块中使用 spark 读取无法读取 csv 文件 - Unable to read csv file using spark read in azure databricks Azure Databricks 写入 JSON 数据到 Parquet 文件抛出错误:TypeError:无法推断类型的模式 - Azure Databricks write JSON Data to Parquet file throws error: TypeError: Can not infer schema for type 如何从 Azure Python function blob 输入绑定中读取镶木地板文件? - How to read parquet file from Azure Python function blob input binding? 如何将镶木地板文件上传到 Azure ADLS 2 Blob - how to upload a parquet file into Azure ADLS 2 Blob
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM