简体   繁体   中英

Azure Synapse: Upload directory of py files in Spark job reference files

I am trying to pass a whole directory of python files that are referenced in the main python file in Azure Synapse Spark Job Definition but the files are not appearing in the location and I get Module Not Found Error. Trying to upload like this:

abfss://[directory path in data lake]/*

You have to trick the Spark job definition by exporting it, editing it as a JSON, and importing it back.

After the export, open in a text editor and add the following:

"conf": {
  "spark.submit.pyFiles": 
    "path-to-abfss/module1.zip, path-to-abfss/module2.zip"
},

Now, import the JSON back.

The way to achieve this on Synapse is to package your python files into a wheel package and upload the wheel package to a specific location the Azure Data Lake Storage where your spark pool will load them from every time it starts. This will make the custom python packages available to all jobs and notebooks using that spark pool.

You can find more details on the official documentation: https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-python-packages#install-wheel-files

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM