I'm trying to write pandas data frame to the local system or to hdfs with spark in cluster mode but it's throwing an error like
IOError: [Errno 2] No such file or directory: {hdfs_path/file_name.txt}
This is how I'm writing
df.to_csv("hdfs_path/file_name.txt", sep="|")
I am using python and the job is running through a shell script.
This works fine if I'm in local mode but doesn't in yarn-cluster mode.
Any support is welcome and thanks in advance.
I have the same issue, i always convert the dataframe into a spark dataframe before creating a file on an Apache Spark filesystem :
df_sp = spark.createDataFrame(df_pd)
df_sp.coalesce(1).write.csv("my_file.csv", mode='overwrite', header = True)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.