I want to iterate through the files available in the DBFS location in Databricks. But it's throwing an error saying 'org.apache.spark.sql.AnalysisException: Path does not exist:' Here's the code which I tried:
import os
from pyspark.sql.types import *
fileDirectory = '/dbfs/FileStore/tables/'
for fname in os.listdir(fileDirectory):
df_app = sqlContext.read.format("csv").\
option("header", "true"). \`enter code here`
load(fileDirectory + fname)
And the error is
org.apache.spark.sql.AnalysisException: Path does not exist: dbfs:/dbfs/FileStore/tables/Dept_data.csv;
Can you please help with this.
Thanks in Advance
When reading files in Databricks using the DataFrameReaders (ie: spark.read...
), the paths are read directly from DBFS, where the FileStore tables directory is, in fact: dbfs:/FileStore/tables/
. The point is that, using the Python os library, the DBFS is another path folder (and that is why you can access it using /dbfs/FileStore/tables). So, using something like this should work fine:
import os
from pyspark.sql.types import *
fileDirectory = '/dbfs/FileStore/tables/'
dir = '/FileStore/tables/'
for fname in os.listdir(fileDirectory):
df_app = sqlContext.read.format("json").option("header", "true").load(dir + fname)
In addition, you can double check the dbutils commands ( https://docs.databricks.com/dev-tools/databricks-utils.html#dbutilsfsls-command ) that can help you to manipulate the DBFS directly (without dealing with dbfs inner implementation). Hope this helps
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.