Is there any way to assign a csv file from dbfs (databricks) path to a variable in pyspark?

Question

I am executing following code in Databricks to convert a spark dataframe into csv dataframe.csv and store in dbfs path.

df.coalesce(1)\
 .write\
 .format("com.databricks.spark.csv")\
 .option("header", "true")\
 .save("dataframe.csv")

This file is getting created in dbfs:/dataframe.csv . I need to assign this file to a filename so that I can attach the file to mail. I am using:

filename = pandas.read_csv("dataframe.csv")

But this is throwing me error: IOError: File dataframe.csv does not exist

Can someone please help me?

Answer 1

You need to prepend the filename with the /dbfs folder, like this:

filename = "/dbfs/somefile.csv"
frame = pd.read_csv(filename)

Here, you'll be using Databricks file system's local file API , which is one of several ways that you can interact with this distributed file system.

Is there any way to assign a csv file from dbfs (databricks) path to a variable in pyspark?

Question

1 answers

solution1
0 2019-11-22 14:12:44

Is there any way to assign a csv file from dbfs (databricks) path to a variable in pyspark?

Question

1 answers

solution1 0 2019-11-22 14:12:44

solution1
0 2019-11-22 14:12:44