[英]Listing files on Microsoft Azure Databricks
I'm working in the Microsoft Azure Databricks.我在 Microsoft Azure Databricks 工作。 And using the ls
command, I found out that there is a CSV file present in it (see first screenshot).使用ls
命令,我发现其中有一个 CSV 文件(见第一个屏幕截图)。 But when I was trying to pick the CSV file into a list using glob, it's is returning an empty list (see second screenshot).但是当我尝试使用 glob 将 CSV 文件选择到列表中时,它返回一个空列表(参见第二个屏幕截图)。
How can I list the contents of a directory in Databricks?如何列出 Databricks 中目录的内容?
%fs
ls /FileStore/tables/26AS_report/normalised_consol_file_record_level/part1/customer_pan=AAACD3312M/
path = "/FileStore/tables/26AS_report/normalised_consol_file_record_level/part1/customer_pan=AAACD3312M/"
result = glob.glob(path+'/**/*.csv', recursive=True)
print(result)
glob
is a local file-level operation that doesn't know about DBFS. glob
是不了解 DBFS 的本地文件级操作。 If you want to use it, then you need to prepend a /dbfs
to your path:如果你想使用它,那么你需要在你的路径前加上一个/dbfs
:
path = "/dbfs/FileStore/tables/26AS_report/....."
I don't think you can use standard Python file system functions from the os.path
or glob
modules.我认为您不能使用os.path
或glob
模块中的标准 Python 文件系统函数。
Instead, you should use the Databricks file system utility ( dbutils.fs
).相反,您应该使用 Databricks 文件系统实用程序 ( dbutils.fs
)。 See documentation .请参阅文档。
Given your example code, you should do something like:根据您的示例代码,您应该执行以下操作:
dbutils.fs.ls(path)
or或者
dbutils.fs.ls('dbfs:' + path)
This should give a list of files that you may have to filter yourself to only get the *.csv
files.这应该给出一个文件列表,您可能必须自己过滤这些文件才能只获取*.csv
文件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.