简体   繁体   English

从 Synapse Notebook 覆盖 Azure datalake Gen 2 中的文件会引发异常

[英]Overwriting a file in Azure datalake Gen 2 from Synapse Notebook throws Exception

As part of migrating from Azure Databricks to Azure Synapse Analytics Notebooks, I'm facing the issue explained below.作为从 Azure Databricks 迁移到 Azure Synapse Analytics Notebooks 的一部分,我遇到了下面解释的问题。

While reading a CSV file from Azure Datalake Storage Gen 2 and assigning it to a pyspark dataframe using the following command.从 Azure Datalake Storage Gen 2 读取 CSV 文件并使用以下命令将其分配给 pyspark 数据帧时。

df = spark.read.format('csv').option("delimiter", ",").option("multiline", "true").option("quote", '"').option("header", "true").option("escape", "\\").load(csvFilePath)

After processing this file, we need to overwrite it and we use the following command.处理完这个文件后,我们需要覆盖它,我们使用以下命令。

df.coalesce(1).write.option("delimiter", ",").csv(csvFilePath, mode = 'overwrite', header = 'true')

What this does is, it deletes the existing file at the path "csvFilePath" and the fails with error, "Py4JJavaError: An error occurred while calling o617.csv."这样做的目的是删除路径“csvFilePath”处的现有文件,并且失败并出现错误“Py4JJavaError:调用 o617.csv 时发生错误”。

Things I've noticed:我注意到的事情:

  1. Once the CSV file at path "csvFilePath" is deleted by the overwrite command, data from dataframe "df" also gets removed.一旦覆盖命令删除了路径“csvFilePath”处的 CSV 文件,数据帧“df”中的数据也会被删除。
  2. Looks like it is referring the file at runtime whereas traditionally in databricks we did not have this issue and overwrite ran successfully.看起来它是在运行时引用文件,而传统上在 databricks 中我们没有这个问题并且覆盖成功运行。

[Error returned by Synapse Notebook at write command.][1] [1]: https://i.stack.imgur.com/Obj9q.png [Synapse Notebook 在写入命令时返回的错误。][1] [1]:https://i.stack.imgur.com/Obj9q.png

It's suggestable to perform mounting the data storage.建议执行挂载数据存储。 Kindly refer the below documentation.请参考以下文档。

https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-use-databricks-spark https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-use-databricks-spark

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将数据帧从 Azure Databricks 笔记本写入 Azure DataLake Gen2 表 - Write DataFrame from Azure Databricks notebook to Azure DataLake Gen2 Tables Reading Json file from Azure datalake as a file using Json.load in Azure databricks /Synapse notebooks - Reading Json file from Azure datalake as a file using Json.load in Azure databricks /Synapse notebooks 如何使用 Azure Synapse 和 pySpark 笔记本从 ADLS gen2 检索 .dcm 图像文件? - How to retrieve .dcm image files from the ADLS gen2 using Azure Synapse and pySpark notebook? 从突触 Dwh 读取表时出现 Azure Synapse 异常 - Azure Synapse Exception while reading table from synapse Dwh 为什么 Databricks Python 不能从我的 Azure Datalake Storage Gen1 读取? - Why can't Databricks Python read from my Azure Datalake Storage Gen1? 如何使用pyspark从Databricks内的ADLS Gen2 Datalake中的“文件共享”读取csv文件 - How to read a csv file from a "File Share" in an ADLS Gen2 Datalake inside Databricks using pyspark 如何从 Python Azure Synapse 笔记本连接到 Oracle DB? - How to connect to an Oracle DB from a Python Azure Synapse notebook? 如何使用 Azure Synapse 中的 Pyspark 从 ADLS Gen2 中的文件夹中读取多个文件并用于处理? - How to Read Multiple Files from a Folder in ADLS Gen2 using Pyspark in Azure Synapse and use for Processing? 转换和读取 Azure Synapse notebook 中的 json 文件 - Transforming and Reading json files in Azure Synapse notebook 使用帐户密钥从 Synapse Notebook 写入 ADLS - Writing to ADLS from Synapse Notebook with account key
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM