从 Synapse Notebook 覆盖 Azure datalake Gen 2 中的文件会引发异常

Question

As part of migrating from Azure Databricks to Azure Synapse Analytics Notebooks, I'm facing the issue explained below.作为从 Azure Databricks 迁移到 Azure Synapse Analytics Notebooks 的一部分，我遇到了下面解释的问题。

While reading a CSV file from Azure Datalake Storage Gen 2 and assigning it to a pyspark dataframe using the following command.从 Azure Datalake Storage Gen 2 读取 CSV 文件并使用以下命令将其分配给 pyspark 数据帧时。

df = spark.read.format('csv').option("delimiter", ",").option("multiline", "true").option("quote", '"').option("header", "true").option("escape", "\\").load(csvFilePath)

After processing this file, we need to overwrite it and we use the following command.处理完这个文件后，我们需要覆盖它，我们使用以下命令。

df.coalesce(1).write.option("delimiter", ",").csv(csvFilePath, mode = 'overwrite', header = 'true')

What this does is, it deletes the existing file at the path "csvFilePath" and the fails with error, "Py4JJavaError: An error occurred while calling o617.csv."这样做的目的是删除路径“csvFilePath”处的现有文件，并且失败并出现错误“Py4JJavaError：调用 o617.csv 时发生错误”。

Things I've noticed:我注意到的事情：

Once the CSV file at path "csvFilePath" is deleted by the overwrite command, data from dataframe "df" also gets removed.一旦覆盖命令删除了路径“csvFilePath”处的 CSV 文件，数据帧“df”中的数据也会被删除。
Looks like it is referring the file at runtime whereas traditionally in databricks we did not have this issue and overwrite ran successfully.看起来它是在运行时引用文件，而传统上在 databricks 中我们没有这个问题并且覆盖成功运行。

[Error returned by Synapse Notebook at write command.][1] [1]: https://i.stack.imgur.com/Obj9q.png [Synapse Notebook 在写入命令时返回的错误。][1] [1]：https://i.stack.imgur.com/Obj9q.png

Answer 1

It's suggestable to perform mounting the data storage.建议执行挂载数据存储。 Kindly refer the below documentation.请参考以下文档。

https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-use-databricks-spark https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-use-databricks-spark

从 Synapse Notebook 覆盖 Azure datalake Gen 2 中的文件会引发异常

问题描述

1 个解决方案

解决方案1
0 2022-06-01 09:02:41

从 Synapse Notebook 覆盖 Azure datalake Gen 2 中的文件会引发异常

问题描述

1 个解决方案

解决方案1 0 2022-06-01 09:02:41

解决方案1
0 2022-06-01 09:02:41