简体   繁体   English

to_csv“没有这样的文件或目录”但是目录确实存在-ADLS上的Databricks

[英]to_csv "No Such File or Directory" But the directory does exist - Databricks on ADLS

I've seen many iterations of this question but cannot seem to understand/fix this behavior.我已经看到这个问题的许多迭代,但似乎无法理解/修复这种行为。

I am on Azure Databricks working on DBR 10.4 LTS Spark 3.2.1 Scala 2.12 trying to write a single csv file to blob storage so that it can be dropped to an SFTP server.我在 Azure Databricks 上使用 DBR 10.4 LTS Spark 3.2.1 Scala 2.12 尝试将单个 csv 文件写入 blob 存储,以便可以将其删除到 SFTP 服务器。 Could not use spark-sftp because I am on Scala 2.12 unfortunately and could not get the library to work.无法使用 spark-sftp,因为不幸的是我使用的是 Scala 2.12,无法使该库正常工作。

Given this is a small dataframe, I am converting it to pandas and then attempting to_csv.鉴于这是一个小数据框,我将其转换为 pandas,然后尝试使用 to_csv。

to_export = df.toPandas()

to_export.to_csv(pathToFile, index = False)

I get the error: [Errno 2] No such file or directory: '/dbfs/mnt/adls/Sandbox/user/project_name/testfile.csv我收到错误: [Errno 2] No such file or directory: '/dbfs/mnt/adls/Sandbox/user/project_name/testfile.csv

Based on the information in other threads, I create the directory with dbutils.fs.mkdirs("/dbfs/mnt/adls/Sandbox/user/project_name/") /n Out[40]: True根据其他线程中的信息,我创建目录dbutils.fs.mkdirs("/dbfs/mnt/adls/Sandbox/user/project_name/") /n Out[40]: True

The response is true and the directory exists, yet I still get the same error.响应为真且目录存在,但我仍然遇到相同的错误。 I'm convinced it is something obvious and I've been staring at it for too long to notice.我确信这是显而易见的事情,而且我已经盯着它看了太久以至于没有注意到。 Does anyone see what my error may be?有没有人看到我的错误可能是什么?

  • Python's pandas library recognizes the path only when it is in File API Format (since you are using mount). Python 的pandas库仅在文件 API 格式时才识别路径(因为您使用的是挂载)。 And dbutils.fs.mkdirs uses Spark API Format which is different from File API Format.dbutils.fs.mkdirs使用的Spark API 格式不同于文件 API 格式。

  • As you are creating the directory using dbutils.fs.mkdirs with path as /dbfs/mnt/adls/Sandbox/user/project_name/ , this path would be actually considered as dbfs:/dbfs/mnt/adls/Sandbox/user/project_name/ .当您使用 dbutils.fs.mkdirs 创建目录时,路径为/dbfs/mnt/adls/Sandbox/user/project_name/ ,此路径实际上将被视为dbfs:/dbfs/mnt/adls/Sandbox/user/project_name/ Hence, the directory would be created within DBFS.因此,该目录将在 DBFS 中创建。

dbutils.fs.mkdirs('/dbfs/mnt/repro/Sandbox/user/project_name/')

在此处输入图像描述

  • So, you have to create the directory by modify the code to create directory to the following code:因此,您必须通过将创建目录的代码修改为以下代码来创建目录:
dbutils.fs.mkdirs('/mnt/repro/Sandbox/user/project_name/')
#OR
#dbutils.fs.mkdirs('dbfs:/mnt/repro/Sandbox/user/project_name/')
  • Writing to the folder would now work without any issue.写入文件夹现在可以毫无问题地工作。
pdf.to_csv('/dbfs/mnt/repro/Sandbox/user/project_name/testfile.csv', index=False)

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM