简体   繁体   English

从终端识别 Python 子进程

[英]Identify Python Subprocess from terminal

We have developed a python function that initiates a subprocess call with pdftoppm/pdftocairo to split pdfs and store each page as an individual image.我们开发了一个 python function,它使用 pdftoppm/pdftocairo 启动子进程调用来拆分 pdf 并将每个页面存储为单独的图像。 Say if a document is 10 pages, it creates 10 individual png files each representing the page of the document.假设一个文档有 10 页,它会创建 10 个单独的 png 文件,每个文件代表文档的页面。 Is there a way to intercept the process from the terminal using htop or ps -ef commands?有没有办法使用htopps -ef命令从终端拦截进程?

If your Python program is still running when you want to reap the subprocesses, the simplest solution is probably to pass a timeout keyword parameter to Popen.wait() or Popen.communicate() .如果您想要获取子进程时 Python 程序仍在运行,最简单的解决方案可能是将timeout关键字参数传递给Popen.wait()Popen.communicate()

subprocs = []
for page in pdf.pages():
   sub = subprocess.Popen(['pdftoppm', 'etc', '--page', str(page), filename])
   subprocs.append(sub)
# some Python processing here while you wait for the subprocesses to run in the background?
# Then once you are done and only want to reap them before you continue
for sub in subprocs:
   sub.wait(timeout=60)

When you wait on a subprocess which has already finished, the call returns immediately.当您wait已经完成的子进程时,调用会立即返回。 When you wait on a subprocess which has already exceeded its timeout, that too should be (roughly) immediate.当您wait已经超过其超时的子进程时,这也应该(大致)立即进行。 So the final for loop should effectively wait for the first subprocess which hasn't yet finished or exceeded its timeout, and then rapidly reap the rest.所以最终for循环应该有效地等待第一个尚未完成或超过其超时的子进程,然后快速收获 rest。

If your Python program has already finished executing and you have a bunch of subprocesses left running, the subprocesses you started will be orphans which get reparented to be children of PID 1, so you can no longer inspect the parent process and see that they are yours.如果您的 Python 程序已经完成执行并且您还有一堆子进程正在运行,那么您启动的子进程将是孤儿,它们被重新定义为 PID 1 的子进程,因此您无法再检查父进程并查看它们是否属于您. If they all run in a specific directory which no other processes are executing in, that could be a good way to isolate them.如果它们都在没有其他进程正在执行的特定目录中运行,那么这可能是隔离它们的好方法。 (In subprocess.Popen() you can pass in a directory with cwd=path_to_dir .) On Linux, the /proc filesystem lets you easily traverse the process tree and inspect individual processes. (在subprocess.Popen()中,您可以使用cwd=path_to_dir传入目录。)在 Linux 上, /proc文件系统允许您轻松遍历进程树并检查各个进程。 The cwd entry in the process tree is a symlink to the directory where the process is running.进程树中的cwd条目是进程运行所在目录的符号链接。

from pathlib import Path

for proc in Path('/proc').iterdir():
  if all(x.isdigit() for x in proc.name):
    if proc/'cwd'.readlink() == '/path/to/dir':
      print(proc)

Unfortunately, Path.readlink() was only introduced in Python 3.9;不幸的是, Path.readlink()仅在 Python 3.9 中引入; if you need this on a machine with an older Python version, try the more traditional os.path spaghetti:如果您在具有较旧 Python 版本的机器上需要此功能,请尝试更传统的os.path spaghetti:

import os

for proc in os.listdir('/proc'):
  if all(x.isdigit() for x in proc):
    if os.readlink(os.path.join('/proc', proc, 'cwd')) == '/path/to/dir':
      print(proc)

Note that /proc is not portable, but since you specifically ask about Ubuntu, you should be able to use this approach.请注意, /proc不可移植,但由于您特别询问 Ubuntu,您应该可以使用这种方法。

If you don't want to run the subprocesses in a particular unique directory, there are probably other means to find your processes if they are reasonably unique, or to make them reasonably unique in order to facilitate this.如果您不想在特定的唯一目录中运行子进程,则可能有其他方法可以找到您的进程(如果它们是相当唯一的,或者使它们合理唯一以促进这一点)。 Your question really doesn't reveal enough about your code or your requirements to know what exactly will work for you.您的问题确实没有充分揭示您的代码或您的要求,无法知道究竟什么对您有用。

Perhaps you can just run the processes with an external timeout command and leave it at that.也许您可以使用外部timeout命令运行进程并将其保留。 The GNU Coreutils timeout binary is part of the Ubuntu base install (but might not be available out of the box on some other U*x-like systems). GNU Coreutils timeout二进制文件是 Ubuntu 基本安装的一部分(但在某些其他 U*x 类系统上可能不可用)。

for page in pdf.pages():
    subprocess.Popen(['timeout', '60', 'pdftoppm', 'etc', '--page', str(page), filename])

(The above obviously guesses wildly about the actual command you are running and what parameters it takes.) (上面显然猜测你正在运行的实际命令以及它需要什么参数。)

If you actually ran the process via the subprocess module, then it should show up as a regular (child-) process, yes.如果您实际上是通过子流程模块运行该流程,那么它应该显示为常规(子)流程,是的。

>>> from subprocess import run
>>> run('/usr/bin/cat')

Will result in:将导致:

$ ps -u myuser
...
  36456 pts/2    00:00:00 python3
  36463 pts/2    00:00:00 cat
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM