简体   繁体   English

Databricks - FileNotFoundException

[英]Databricks - FileNotFoundException

I'm sorry if this is basic and I missed something simple.如果这是基本的,我很抱歉,我错过了一些简单的东西。 I'm trying to run the code below to iterate through files in a folder and merge all files that start with a specific string, into a dataframe. All files sit in a lake.我正在尝试运行下面的代码以遍历文件夹中的文件并将以特定字符串开头的所有文件合并到 dataframe 中。所有文件都位于一个湖中。

file_list=[]
path = "/dbfs/rawdata/2019/01/01/parent/"
files  = dbutils.fs.ls(path)
for file in files:
    if(file.name.startswith("CW")):
       file_list.append(file.name)
df = spark.read.load(path=file_list)

# check point
print("Shape: ", df.count(),"," , len(df.columns))
db.printSchema()

This looks fine to me, but apparently something is wrong here.这对我来说看起来不错,但显然这里有问题。 I'm getting an error on this line:我在这一行收到一个错误:
files = dbutils.fs.ls(path)

Error message reads:错误信息如下:

java.io.FileNotFoundException: File/6199764716474501/dbfs/rawdata/2019/01/01/parent does not exist.

The path, the files, and everything else definitely exist.路径、文件和其他一切都肯定存在。 I tried with and without the 'dbfs' part.我尝试使用和不使用“dbfs”部分。 Could it be a permission issue?会不会是权限问题? Something else?还有别的吗? I Googled for a solution.我用谷歌搜索了一个解决方案。 Still can't get traction with this.仍然无法获得牵引力。

Make sure you have a folder named "dbfs" if your parent folder starts from "rawdata" the path should be "/rawdata/2019/01/01/parent" or "rawdata/2019/01/01/parent".如果您的父文件夹从“rawdata”开始,请确保您有一个名为“dbfs”的文件夹,路径应为“/rawdata/2019/01/01/parent”或“rawdata/2019/01/01/parent”。

The error is thrown in case of incorrect path.如果路径不正确,则会引发错误。

This is an old thread, but if someone is still looking for a solution: It does require path to be listed as: "dbfs:/rawdata/2019/01/01/parent/"这是一个旧线程,但如果有人仍在寻找解决方案:它确实需要将路径列为:“dbfs:/rawdata/2019/01/01/parent/”

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM