简体   繁体   中英

Is there any way to assign a csv file from dbfs (databricks) path to a variable in pyspark?

I am executing following code in Databricks to convert a spark dataframe into csv dataframe.csv and store in dbfs path.

df.coalesce(1)\
 .write\
 .format("com.databricks.spark.csv")\
 .option("header", "true")\
 .save("dataframe.csv")

This file is getting created in dbfs:/dataframe.csv . I need to assign this file to a filename so that I can attach the file to mail. I am using:

filename = pandas.read_csv("dataframe.csv")

But this is throwing me error: IOError: File dataframe.csv does not exist

Can someone please help me?

You need to prepend the filename with the /dbfs folder, like this:

filename = "/dbfs/somefile.csv"
frame = pd.read_csv(filename)

Here, you'll be using Databricks file system's local file API , which is one of several ways that you can interact with this distributed file system.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM