简体   繁体   中英

Overwriting a file in Azure datalake Gen 2 from Synapse Notebook throws Exception

As part of migrating from Azure Databricks to Azure Synapse Analytics Notebooks, I'm facing the issue explained below.

While reading a CSV file from Azure Datalake Storage Gen 2 and assigning it to a pyspark dataframe using the following command.

df = spark.read.format('csv').option("delimiter", ",").option("multiline", "true").option("quote", '"').option("header", "true").option("escape", "\\").load(csvFilePath)

After processing this file, we need to overwrite it and we use the following command.

df.coalesce(1).write.option("delimiter", ",").csv(csvFilePath, mode = 'overwrite', header = 'true')

What this does is, it deletes the existing file at the path "csvFilePath" and the fails with error, "Py4JJavaError: An error occurred while calling o617.csv."

Things I've noticed:

  1. Once the CSV file at path "csvFilePath" is deleted by the overwrite command, data from dataframe "df" also gets removed.
  2. Looks like it is referring the file at runtime whereas traditionally in databricks we did not have this issue and overwrite ran successfully.

[Error returned by Synapse Notebook at write command.][1] [1]: https://i.stack.imgur.com/Obj9q.png

It's suggestable to perform mounting the data storage. Kindly refer the below documentation.

https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-use-databricks-spark

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM