简体   繁体   中英

Py4JJavaError: An error occured while calling o8660.save when trying to save csv file locally

I want to save a csv file locally rather than saving it to Hadoop file system. I got the following error when I use the path that starts with

> 'file://'

How can I fixed this? Or How can I save the file locally without any errors?

在此处输入图像描述

I'm afraid its not gonna work like that because saving the data locally implies it must all be present on the driver. Per pyspark docs , the path parameter in pyspark.sql.DataFrameWriter.csv is a "path in any Hadoop supported file system" .

So as far as I can tell, there are several alternatives:

  1. Save dataframe to HDFS/Hadoop and then copy it to local FS hdfs dfs -mget... . This would be most straightforward and preferred way.
  2. Do df.collect() to bring complete dataframe to the driver, and then write it to local FS. This might not be feasible for large dataframes, since it can crash the driver with OOM.
  3. use df.toLocalIterator() to bring data to the driver one partition at a time, and then write it to local FS. This avoids / lessens OOM chances presented by previous option.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM