[英]How to automate long-running code and saving data in Azure databricks?
I am using the %run feature in Azure databricks to execute many notebooks in sequence from a command notebook.我正在使用 Azure 数据块中的 %run 功能从命令笔记本按顺序执行许多笔记本。 One notebook has a long computation on a dataset (~ 5 hrs) and I want to save the output of this.一个笔记本对数据集的计算时间很长(约 5 小时),我想保存其中的 output。 I tried including the save step at the end of the long-running notebook, but the save times out (see error below).我尝试在长时间运行的笔记本末尾包含保存步骤,但保存超时(请参阅下面的错误)。 I'm only seeing this error when the long-running notebook takes 2+ hrs to run.我只在长时间运行的笔记本需要 2 小时以上才能运行时看到此错误。 Is there any way I can automate this?有什么办法可以自动化吗?
I'm able to pass the data I want back through the %run feature in the command notebook and save the data there, but I have to run the save manually after the long-running notebook, otherwise I get the same authentication timeout error.我可以通过命令笔记本中的 %run 功能将我想要的数据传回并将数据保存在那里,但我必须在长时间运行的笔记本之后手动运行保存,否则我会收到相同的身份验证超时错误。 I'd like to be able to have one notebook where I only need to click "run all".我希望能够拥有一个只需要单击“全部运行”的笔记本。
I find it is better to break up long notebooks into smaller ones and use the multi-task job scheduler to help run them in order.我发现最好将长笔记本分解成更小的笔记本并使用多任务作业调度程序来帮助按顺序运行它们。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.