简体   繁体   English

如何将整个 python 应用程序添加到 azure databricks 中并运行它?

[英]How to add whole python application into azure databricks and run it?

we have a functional model written in Python.我们有一个用 Python 编写的功能模型。 I would like to copy all of code at once and run it from azure databricks - I saw there is a way to run python code from azure data factory, but I see its only for one python file, is it correct?我想一次复制所有代码并从 azure databricks 运行它 - 我看到有一种方法可以从 azure 数据工厂运行 python 代码,但我看到它只用于一个 python 文件,对吗?

I know, I could upload a wheel, or an egg, but in that way, I probably have to import it into notebook.我知道,我可以上传一个轮子或一个鸡蛋,但那样的话,我可能必须将它导入到 notebook 中。 will I be able to access this wheel through CLI or azure data factory?我能通过 CLI 或 azure 数据工厂访问这个轮子吗? Will I lose option to set parameters?我会失去设置参数的选项吗?

We use gitlab, so this option is off table, for now.我们使用 gitlab,所以这个选项暂时不在表中。

Thx a lot多谢

Edit I want to summarize what I have found, some of below might be really wrong.编辑我想总结一下我发现的东西,下面的一些可能真的是错误的。

  • I can upload a wheel and use the python app as a library -> I can rewrite the main for CLI app to the notebook and just import the library.我可以上传一个轮子并将 python 应用程序用作库 -> 我可以将 CLI 应用程序的 main 重写到笔记本,然后导入库。
  • I can rewrite all of code into notebooks -> this might be the best way, but for existing app with no small size it is painful我可以将所有代码重写到笔记本中 -> 这可能是最好的方法,但是对于体积不小的现有应用程序来说是痛苦的
  • I can create folders and upload python code into FS, to simulate the Python project and call in the notebook... (did not tried yet)我可以创建文件夹并将python代码上传到FS,模拟Python项目并在notebook中调用...(还没试过)
  • I can use the github to import code (I did not tried it yet, i can't move the code from gitlab to github because of nda)可以用github导入代码(我还没试过,因为nda不能把代码从gitlab移动到github)
  • I can run the code from may IDE connected to databricks我可以从连接到数据块的 IDE 运行代码
  • I can run start python script in Data azure pipeline, but I'm not sure about the wheel.我可以在 Data azure 管道中运行 start python 脚本,但我不确定轮子。
  • I can probably use another azure module (which one? Where to put the code?) then databricks to run python code from CLI -> but in case of python spark it does not make sense (I did not tried it yet because of this)可能可以使用另一个 azure 模块(哪个?将代码放在哪里?)然后使用 databricks 从 CLI 运行 python 代码 -> 但是在 python spark 的情况下它没有意义(我还没有尝试过,因为这个)
  • I can probably run from the notebook trough %sh script the python somewhere saved in the azure space (again, where it shoudl be?) and pass parameters.可能可以通过 %sh 脚本从笔记本中运行保存在天蓝色空间中某处的python(再次,它应该在哪里?)并传递参数。 (I did not tried it yet (我还没试过

You can copy your python code and paste it into a cell in a Databricks notebook and run it that way.您可以复制 Python 代码并将其粘贴到 Databricks 笔记本中的单元格中,然后以这种方式运行。

You could also use the Databricks CLI to upload your import your file to a Databricks workspace.您还可以使用 Databricks CLI 将导入的文件上传到 Databricks 工作区。
See https://docs.databricks.com/dev-tools/cli/workspace-cli.html请参阅https://docs.databricks.com/dev-tools/cli/workspace-cli.html

Databricks python notebooks are just .py files anyways with some special comments.无论如何,Databricks python notebooks 只是带有一些特殊注释的 .py 文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM