读取由Dataframewriter Pyspark编写的Csv文件

Question

I was having dataframe which I wrote to a CSV by using below code: 我使用以下代码编写了写入CSV的数据框：

df.write.format("csv").save(base_path+"avg.csv")

As i am running spark in client mode, above snippets created a folder name avg.csv and the folder contains some file with part-* .csv on my worker node or nested folder then file part-*.csv. 当我在客户端模式下运行spark时，上面的代码片段创建了一个名为avg.csv的文件夹，并且该文件夹包含一些文件，该文件在我的工作节点上或嵌套文件夹中带有part- * .csv，然后是part-*。csv。

Now when I am trying to read avg.csv I am getting path doesn't exist. 现在，当我尝试读取avg.csv时，我得到的路径不存在。

df.read.format("com.databricks.spark.csv").load(base_path+"avg.csv")

Can anybody tell where am I doing wrong ? 有人可以告诉我我在哪里做错了吗？

Answer 1

Part-00** files are output of distributively computed files (like MR, spark). Part-00**文件是分布式计算文件（例如MR，spark）的输出。 So, it will be always a folder created with part files when you try to store, as this is an output of some distributed storage which is to be kept in mind. 因此，当您尝试存储时，它将始终是使用零件文件创建的文件夹，因为这是某些分布式存储的输出，请记住。

So, try using: 因此，尝试使用：

df.read.format("com.databricks.spark.csv").load(base_path+"avg.csv/*")

读取由Dataframewriter Pyspark编写的Csv文件

问题描述

1 个解决方案

解决方案1
2 2019-02-16 14:49:41

读取由Dataframewriter Pyspark编写的Csv文件

问题描述

1 个解决方案

解决方案1 2 2019-02-16 14:49:41

解决方案1
2 2019-02-16 14:49:41