简体   繁体   English

无法使用 Databricks 将 Apache Spark parquet 文件保存到 csv

[英]Unable to Save Apache Spark parquet file to csv with Databricks

I'm trying save/convert a parquet file to csv on Apache Spark with Databricks but not having much luck.我正在尝试使用 Databricks 在 Apache Spark 上将镶木地板文件保存/转换为 csv,但运气不佳。

The following code successfully writes to a folder called tempDelta:以下代码成功写入名为 tempDelta 的文件夹:

df.coalesce(1).write.format("parquet").mode("overwrite").option("header","true").save(saveloc+"/tempDelta")

I then would like to convert the parquet file to csv as follows:然后我想将镶木地板文件转换为 csv 如下:

df.coalesce(1).write.format("parquet").mode("overwrite").option("header","true").save(saveloc+"/tempDelta").csv(saveloc+"/tempDelta")


AttributeError                            Traceback (most recent call last)
<command-2887017733757862> in <module>
----> 1 df.coalesce(1).write.format("parquet").mode("overwrite").option("header","true").save(saveloc+"/tempDelta").csv(saveloc+"/tempDelta")

AttributeError: 'NoneType' object has no attribute 'csv'

I have also tried the following after writing to the location:写入该位置后,我还尝试了以下操作:

df.write.option("header","true").csv(saveloc+"/tempDelta2")

But it get the error:但它得到错误:

A transaction log for Databricks Delta was found at `/CURATED/F1Area/F1Domain/final/_delta_log`,
but you are trying to write to `/CURATED/F1Area/F1Domain/final/tempDelta2` using format("csv"). You must use
'format("delta")' when reading and writing to a delta table.

And when I try to save as a csv to folder that isn't a delta folder I get the following error:当我尝试将 csv 保存到不是增量文件夹的文件夹时,我收到以下错误:

df.write.option("header","true").csv("testfolder")


AnalysisException: CSV data source does not support struct data type.

Can someone let me know the best way of saving / converting from parquet to csv with Databricks有人可以让我知道使用 Databricks 将镶木地板保存/转换为 csv 的最佳方法吗

You can use either of the below 2 options您可以使用以下 2 个选项之一

1. df.write.option("header",true).csv(path)

2. df.write.format("csv").save(path)

Note: You cant mention format as parquet and use.csv function at once.注意:您不能将格式称为镶木地板并立即使用.csv function。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 错误:'str' object 在使用 Databricks 的 Apache Spark 中将 Parquet 转换为 CSV 时没有属性“write” - Error:'str' object has no attribute 'write' when converting Parquet to CSV in Apache Spark with Databricks 使用Apache Spark将MongoDB数据保存为镶木地板文件格式 - Save MongoDB data to parquet file format using Apache Spark 在 azure 数据块中使用 spark 读取无法读取 csv 文件 - Unable to read csv file using spark read in azure databricks 在Spark Java中另存为Parquet文件 - Save as Parquet file in spark java spark:打开CSV; 另存为分区实木复合地板 - spark: open CSV; save as partitioned parquet 无法使用 Databricks 在 Apache Spark 中调用 function - Unable to call a function in Apache Spark with Databricks 使用 Apache spark 将大型 200mb csv 文件转换为 Parquet 文件 - Converting Large 200mb csv file to Parquet file using Apache spark Databricks/Spark 从 Parquet 文件中读取自定义元数据 - Databricks/Spark read custom metadata from Parquet file 从Spark-CSV编写实木复合地板文件 - Writing a parquet file from Spark-CSV 如何从 Java 连接到 csv 文件并将其写入 Databricks Apache Spark 的远程实例? - How do I connect to and write a csv file to a remote instance of Databricks Apache Spark from Java?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM