[英]Overwriting a file in Azure datalake Gen 2 from Synapse Notebook throws Exception
As part of migrating from Azure Databricks to Azure Synapse Analytics Notebooks, I'm facing the issue explained below.作为从 Azure Databricks 迁移到 Azure Synapse Analytics Notebooks 的一部分,我遇到了下面解释的问题。
While reading a CSV file from Azure Datalake Storage Gen 2 and assigning it to a pyspark dataframe using the following command.从 Azure Datalake Storage Gen 2 读取 CSV 文件并使用以下命令将其分配给 pyspark 数据帧时。
df = spark.read.format('csv').option("delimiter", ",").option("multiline", "true").option("quote", '"').option("header", "true").option("escape", "\\").load(csvFilePath)
After processing this file, we need to overwrite it and we use the following command.处理完这个文件后,我们需要覆盖它,我们使用以下命令。
df.coalesce(1).write.option("delimiter", ",").csv(csvFilePath, mode = 'overwrite', header = 'true')
What this does is, it deletes the existing file at the path "csvFilePath" and the fails with error, "Py4JJavaError: An error occurred while calling o617.csv."这样做的目的是删除路径“csvFilePath”处的现有文件,并且失败并出现错误“Py4JJavaError:调用 o617.csv 时发生错误”。
Things I've noticed:我注意到的事情:
[Error returned by Synapse Notebook at write command.][1] [1]: https://i.stack.imgur.com/Obj9q.png [Synapse Notebook 在写入命令时返回的错误。][1] [1]:https://i.stack.imgur.com/Obj9q.png
It's suggestable to perform mounting the data storage.建议执行挂载数据存储。 Kindly refer the below documentation.
请参考以下文档。
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-use-databricks-spark https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-use-databricks-spark
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.