I'm sorry if this is basic and I missed something simple. I'm trying to run the code below to iterate through files in a folder and merge all files that start with a specific string, into a dataframe. All files sit in a lake.
file_list=[]
path = "/dbfs/rawdata/2019/01/01/parent/"
files = dbutils.fs.ls(path)
for file in files:
if(file.name.startswith("CW")):
file_list.append(file.name)
df = spark.read.load(path=file_list)
# check point
print("Shape: ", df.count(),"," , len(df.columns))
db.printSchema()
This looks fine to me, but apparently something is wrong here. I'm getting an error on this line:
files = dbutils.fs.ls(path)
Error message reads:
java.io.FileNotFoundException: File/6199764716474501/dbfs/rawdata/2019/01/01/parent does not exist.
The path, the files, and everything else definitely exist. I tried with and without the 'dbfs' part. Could it be a permission issue? Something else? I Googled for a solution. Still can't get traction with this.
Make sure you have a folder named "dbfs" if your parent folder starts from "rawdata" the path should be "/rawdata/2019/01/01/parent" or "rawdata/2019/01/01/parent".
The error is thrown in case of incorrect path.
This is an old thread, but if someone is still looking for a solution: It does require path to be listed as: "dbfs:/rawdata/2019/01/01/parent/"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.