无法使用 Databricks 将 Apache Spark parquet 文件保存到 csv

Question

I'm trying save/convert a parquet file to csv on Apache Spark with Databricks but not having much luck.我正在尝试使用 Databricks 在 Apache Spark 上将镶木地板文件保存/转换为 csv，但运气不佳。

The following code successfully writes to a folder called tempDelta:以下代码成功写入名为 tempDelta 的文件夹：

df.coalesce(1).write.format("parquet").mode("overwrite").option("header","true").save(saveloc+"/tempDelta")

I then would like to convert the parquet file to csv as follows:然后我想将镶木地板文件转换为 csv 如下：

df.coalesce(1).write.format("parquet").mode("overwrite").option("header","true").save(saveloc+"/tempDelta").csv(saveloc+"/tempDelta")


AttributeError                            Traceback (most recent call last)
<command-2887017733757862> in <module>
----> 1 df.coalesce(1).write.format("parquet").mode("overwrite").option("header","true").save(saveloc+"/tempDelta").csv(saveloc+"/tempDelta")

AttributeError: 'NoneType' object has no attribute 'csv'

I have also tried the following after writing to the location:写入该位置后，我还尝试了以下操作：

df.write.option("header","true").csv(saveloc+"/tempDelta2")

But it get the error:但它得到错误：

A transaction log for Databricks Delta was found at `/CURATED/F1Area/F1Domain/final/_delta_log`,
but you are trying to write to `/CURATED/F1Area/F1Domain/final/tempDelta2` using format("csv"). You must use
'format("delta")' when reading and writing to a delta table.

And when I try to save as a csv to folder that isn't a delta folder I get the following error:当我尝试将 csv 保存到不是增量文件夹的文件夹时，我收到以下错误：

df.write.option("header","true").csv("testfolder")


AnalysisException: CSV data source does not support struct data type.

Can someone let me know the best way of saving / converting from parquet to csv with Databricks有人可以让我知道使用 Databricks 将镶木地板保存/转换为 csv 的最佳方法吗

Answer 1

You can use either of the below 2 options您可以使用以下 2 个选项之一

1. df.write.option("header",true).csv(path)

2. df.write.format("csv").save(path)

Note: You cant mention format as parquet and use.csv function at once.注意：您不能将格式称为镶木地板并立即使用.csv function。

无法使用 Databricks 将 Apache Spark parquet 文件保存到 csv

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-12-17 12:47:13

无法使用 Databricks 将 Apache Spark parquet 文件保存到 csv

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-12-17 12:47:13

解决方案1
1 已采纳 2021-12-17 12:47:13