简体   繁体   English

读取由Dataframewriter Pyspark编写的Csv文件

[英]Reading Csv file written by Dataframewriter Pyspark

I was having dataframe which I wrote to a CSV by using below code: 我使用以下代码编写了写入CSV的数据框:

df.write.format("csv").save(base_path+"avg.csv")

As i am running spark in client mode, above snippets created a folder name avg.csv and the folder contains some file with part-* .csv on my worker node or nested folder then file part-*.csv. 当我在客户端模式下运行spark时,上面的代码片段创建了一个名为avg.csv的文件夹,并且该文件夹包含一些文件,该文件在我的工作节点上或嵌套文件夹中带有part- * .csv,然后是part-*。csv。

Now when I am trying to read avg.csv I am getting path doesn't exist. 现在,当我尝试读取avg.csv时,我得到的路径不存在。

df.read.format("com.databricks.spark.csv").load(base_path+"avg.csv")

Can anybody tell where am I doing wrong ? 有人可以告诉我我在哪里做错了吗?

Part-00** files are output of distributively computed files (like MR, spark). Part-00**文件是分布式计算文件(例如MR,spark)的输出。 So, it will be always a folder created with part files when you try to store, as this is an output of some distributed storage which is to be kept in mind. 因此,当您尝试存储时,它将始终是使用零件文件创建的文件夹,因为这是某些分布式存储的输出,请记住。

So, try using: 因此,尝试使用:

df.read.format("com.databricks.spark.csv").load(base_path+"avg.csv/*")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM