简体   繁体   中英

writing a pandas dataframe(.csv) to local system or hdfs with spark in cluster mode

I'm trying to write pandas data frame to the local system or to hdfs with spark in cluster mode but it's throwing an error like

IOError: [Errno 2] No such file or directory: {hdfs_path/file_name.txt}

This is how I'm writing

df.to_csv("hdfs_path/file_name.txt", sep="|")

I am using python and the job is running through a shell script.

This works fine if I'm in local mode but doesn't in yarn-cluster mode.

Any support is welcome and thanks in advance.

I have the same issue, i always convert the dataframe into a spark dataframe before creating a file on an Apache Spark filesystem :

df_sp = spark.createDataFrame(df_pd)
df_sp.coalesce(1).write.csv("my_file.csv", mode='overwrite', header = True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM