简体   繁体   English

如何让设置命令仅在第一次运行 Jupyter Notebook 时运行?

[英]How can I make a setup command only run the first time a Jupyter notebook is ran?

Im doing a machine learning project in google colab.我在 google colab 做一个机器学习项目。 Each time an instance is started, I want to run these commands:每次启动实例时,我都想运行这些命令:

  ! mkdir ~/.kaggle # make directory ".kaggle"
  ! cp kaggle.json ~/.kaggle/ # copy the json file into the directory
  ! chmod 600 ~/.kaggle/kaggle.json # allocate required permission for the file
  ! kaggle datasets download -d alessiocorrado99/animals10 # download animal set
  ! unzip animals10.zip

These commands download and extract a dataset I need.这些命令下载并提取我需要的数据集。 However, it only needs to be ran the first run through only.但是,它只需要在第一次运行时运行。 When clicking "Run All" after the initial download of the dataset, it requires user input to decide whether to replace the files or not.在初始下载数据集后单击“全部运行”时,需要用户输入来决定是否替换文件。 I also don't want to keep downloading from kaggle and use resources unnecesarily.我也不想继续从 kaggle 下载并不必要地使用资源。

My current approach is to run the script once then comment out the initialization script, but this takes time and effort.我目前的做法是运行一次脚本,然后注释掉初始化脚本,但这需要时间和精力。

How can I automate this process so a certain cell only runs on the first run of the runtime?我怎样才能使这个过程自动化,以便某个单元只在运行时的第一次运行时运行?

  1. After initial execution of your commands create a dummy variable.初次执行命令后,创建一个虚拟变量。

  2. If you happen to re-execute that code cell an IF will check if that variable is in memory:如果您碰巧重新执行该代码单元,IF 将检查该变量是否在 memory 中:

     import os INIT_SHELL_COMMAND_LIST = [ 'mkdir ~/.kaggle # make directory ".kaggle"', 'next_cmd', ] if not 'INIT_HAPPENED' in locals().keys(): for command in INIT_SHELL_COMMAND_LIST: os.sys(command) INIT_HAPPENED = True

This answer is specific to my problem but I hope this helps others.这个答案是针对我的问题的,但我希望这对其他人有帮助。

Essentially the code that I ran had some measureable effect, in my case it creates a directory called 'raw-img'.本质上,我运行的代码有一些可衡量的效果,在我的例子中,它创建了一个名为“raw-img”的目录。 In order to detect if the code ran, I used a conditional testing for the existance of the file path.为了检测代码是否运行,我对文件路径是否存在进行了条件测试。

import os

if not os.path.isdir('raw-img'):
  print("firstRun")
  ! pip install kaggle # install kaggle library
  ! mkdir ~/.kaggle # make directory ".kaggle"
  ! cp kaggle.json ~/.kaggle/ # copy the json file into the directory
  ! chmod 600 ~/.kaggle/kaggle.json # allocate required permission for the file
  ! kaggle datasets download -d alessiocorrado99/animals10 # download animal set
  ! unzip animals10.zip

However, in any case, whether importing libraries, or really anything else, there should be some code you can write to test if that operation has occured.但是,无论如何,无论是导入库还是其他任何东西,您都应该可以编写一些代码来测试该操作是否已经发生。 So simple use that test to determine whether the code should run again.如此简单地使用该测试来确定代码是否应该再次运行。

For example, an answer here shows how to test if a library is installed.例如,此处的答案显示了如何测试是否安装了库。 https://stackoverflow.com/a/55194886/9053474 If the string is empty, then you know your library has not been installed and so you should run the script. https://stackoverflow.com/a/55194886/9053474如果字符串为空,则您知道您的库尚未安装,因此您应该运行该脚本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM