As part of migrating from Azure Databricks to Azure Synapse Analytics Notebooks, I'm facing the issue explained below.
While reading a CSV file from Azure Datalake Storage Gen 2 and assigning it to a pyspark dataframe using the following command.
df = spark.read.format('csv').option("delimiter", ",").option("multiline", "true").option("quote", '"').option("header", "true").option("escape", "\\").load(csvFilePath)
After processing this file, we need to overwrite it and we use the following command.
df.coalesce(1).write.option("delimiter", ",").csv(csvFilePath, mode = 'overwrite', header = 'true')
What this does is, it deletes the existing file at the path "csvFilePath" and the fails with error, "Py4JJavaError: An error occurred while calling o617.csv."
Things I've noticed:
[Error returned by Synapse Notebook at write command.][1] [1]: https://i.stack.imgur.com/Obj9q.png
It's suggestable to perform mounting the data storage. Kindly refer the below documentation.
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-use-databricks-spark
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.