简体   繁体   English

如何从 Python 作为与 Python session 平行的子进程启动 Tensorboard 开发?

[英]How can I start Tensorboard dev from within Python as a subprocess parallel to the Python session?

I want to monitor training progress of a CNN which is trained via a slurm process on a server (ie, the Python script is executed through a bash script whenever the server has resources available; the session is not interactive. Hence, I cannot simply open a terminal and run Tensorboard dev ). I want to monitor training progress of a CNN which is trained via a slurm process on a server (ie, the Python script is executed through a bash script whenever the server has resources available; the session is not interactive. Hence, I cannot simply open一个终端并运行Tensorboard dev )。

So far, I have tried the following without finding a new experiment on my Tensorboard dev site:到目前为止,我在我的 Tensorboard 开发网站上尝试了以下操作,但没有找到新的实验:

mod = "SomeModelType"
logdir = "/some/directory/used/in/Tensorboard/callback"
PARAMETERS = "Some line of text describing the training settings"

subprocess.Popen(["tensorboard", "dev upload --logdir '" + logdir + \
                  "' --name Myname_" + mod + " --description '" + \
                      PARAMETERS + "'"])

If I insert the text string "tensorboard dev upload --logdir 'some/directory..." in a terminal, Tensorboard will start as expected.如果我在终端中插入文本字符串“tensorboard dev upload --logdir 'some/directory...”,Tensorboard 将按预期启动。 If I include the code showed above, no new Tensorboard experiment will be started.如果我包含上面显示的代码,则不会启动新的 Tensorboard 实验。

I also tried this:我也试过这个:

subprocess.run(["/pfs/data5/home/kit/ifgg/mp3890/.local/bin/tensorboard", \
                "dev", "upload", "--logdir", "'" + logdir + \
                "'", "--name", "LeleNet" + mod#, "--description" + "'" + \
                    #PARAMETERS + "'"
                    ], \
               capture_output = False, text = False)

which starts Tensorboard, but it will not continue the Python script.这会启动 Tensorboard,但不会继续执行 Python 脚本。 Hence, Tensorboard, will be listening to output that never comes, because the Python session is listening to its own output instead of training the CNN. Hence, Tensorboard, will be listening to output that never comes, because the Python session is listening to its own output instead of training the CNN.

Edit This:编辑这个:

subprocess.Popen(["/pfs/data5/home/kit/ifgg/mp3890/.local/bin/tensorboard", \
                "dev", "upload", "--logdir", "'" + logdir + \
                "'", "--name", "LeleNet" + mod#, "--description" + "'" + \
                    #PARAMETERS + "'"
                    ])

led to some message "Listening for new data in the log dir..." popping up all the time in interactive mode and led to cancellation of the slurm job (job disappeared).导致在交互模式下一直弹出一些消息“正在侦听日志目录中的新数据......”并导致取消 slurm 作业(作业消失)。 Moreover, Tensorboard does not work correcty this way.此外,Tensorboard 不能以这种方式正确工作。 The experiment is created, but never receives any data.实验已创建,但从未收到任何数据。

I got it to work as follows:我让它按如下方式工作:

logdir = "/some/directory"
tbn = "some_name"
DESCRIPTION = "some description of the experiment"

subprocess.call("tensorboard dev upload --logdir '" + logdir + \
                    "' --name " + tbn + " --description '" + \
                    DESCRIPTION + "' &", shell = True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM