简体   繁体   English

启动和停止集群中的进程

[英]Starting and stopping processes in a cluster

I'm writing software that runs a bunch of different programs (via twisted's twistd ); 我正在写一个运行了一堆不同的程序(通过软件扭曲的 twistd ); that is N daemons of various kinds must be started across multiple machines. 这是各种N种守护进程必须跨多台机器启动。 If I did this manually, I would be running commands like twistd foo_worker , twistd bar_worker and so on on the machines involved. 如果我手动执行此操作,我将在所涉及的机器上运行诸如twistd foo_workertwistd bar_worker等命令。

Basically there will be a list of machines, and the daemon(s) I need them to run. 基本上会有一个机器列表,以及我需要它们运行的​​守护程序。 Additionally, I need to shut them all down when the need arises. 此外,我需要在需要时关闭它们。

If I were to program this from scratch, I would write a "spawner" daemon that would run permanently on each machine in the cluster with the following features accessible through the network for an authenticated administrator client: 如果我从头开始编程,我会编写一个“spawner”守护程序,该守护程序将在集群中的每台计算机上永久运行,并且可通过网络访问经过身份验证的管理员客户端:

  • Start a process with a given command line. 使用给定的命令行启动进程。 Return a handle to manage it. 返回一个句柄来管理它。
  • Kill a process given a handle. 杀死给定手柄的过程。
  • Optionally, query stuff like cpu time given a handle. (可选)在给定句柄的情况下查询cpu时间等内容。

It would be fairly trivial to program the above, but I cannot imagine this is a new problem. 对上述方案进行编程将是相当简单的,但我无法想象这是一个新问题。 Surely there are existing solutions to doing exactly this? 当然有现成的解决方案正是这样做的吗? I do however lack experience with server administration, and don't even know what the related terms are. 但我确实缺乏服务器管理经验,甚至不知道相关术语是什么。

What existing ways are there to do this on a linux cluster, and what are some of the important terms involved? 在Linux集群上有哪些现有方法可以做到这一点,涉及哪些重要术语? Python specific solutions are welcome, but not necessary. 我们欢迎Python特定的解决方案,但不是必需的。

Another way to put it: Given a bunch of machines in a lan, how do I programmatically work with them as a cluster? 另一种说法:给定LAN中的一堆机器,如何以编程方式将它们作为集群使用?

通常的工具是批处理队列系统,例如SLURM,SGE,Torque / Moab,LSF等。

The most familiar and universal way is just to use ssh . 最熟悉和最通用的方法就是使用ssh To automate you could use fabric . 要自动化,您可以使用fabric

To start foo_worker on all hosts: 要在所有主机上启动foo_worker

$ fab all_hosts start:foo_worker

To stop bar_worker on a particular list of hosts: 要在特定主机列表上停止bar_worker

$ fab -H host1,host2 stop:bar_worker

Here's an example fabfile.py : 这是一个示例fabfile.py

from fabric.api import env, run, hide # pip install fabric

def all_hosts():
    env.hosts = ['host1', 'host2', 'host3']

def start(daemon):
    run("twistd --pid %s.pid %s" % (daemon, daemon))

def stop(daemon):
    run("kill %s" % getpid(daemon))

def getpid(daemon):
    with hide('stdout'):
        return run("cat %s.pid" % daemon)

def ps(daemon):
    """Get process info for the `daemon`."""
    run("ps --pid %s" % getpid(daemon))

There are a number of ways to configure host lists in fabric, with scopes varying from global to per-task, and it's possible mix and match as needed. 有许多方法可以在结构中配置主机列表,范围从全局到每个任务不等,并且可以根据需要进行混合和匹配。 .

To streamline the process management on a particular host you could write initd scripts for the daemons (and run service daemon_name start/stop/restart ) or use supervisord (and run supervisorctl eg, supervisorctl stop all ). 要简化特定主机上的进程管理,您可以为守护进程编写initd脚本(并运行service daemon_name start/stop/restart )或使用supervisord (并运行supervisorctl例如, supervisorctl stop all )。 To control "what installed where" and to push configuration in a centralized manner something like puppet could be used. 为了控制“安装在哪里”并以集中方式推送配置,可以使用类似puppet东西。

Circus : 马戏团 :

Documentation : http://docs.circus.io/en/0.5/index.html 文档: http//docs.circus.io/en/0.5/index.html

Code: http://pypi.python.org/pypi/circus/0.5 代码: http//pypi.python.org/pypi/circus/0.5

Summary from the documentation : 文档摘要:

Circus is a process & socket manager. Circus是一个进程和套接字管理器。 It can be used to monitor and control processes and sockets. 它可用于监视和控制进程和套接字。

Circus can be driven via a command-line interface or programmatically trough its python API. Circus可以通过命令行界面驱动,也可以通过python API以编程方式驱动。

It shares some of the goals of Supervisord, BluePill and Daemontools. 它分享了Supervisord,BluePill和Daemontools的一些目标。 If you are curious about what Circus brings compared to other projects, read Why should I use Circus instead of X ?. 如果您对Circus与其他项目相比所带来的好奇心,请阅读为什么我应该使用Circus而不是X?

Circus is designed using ZeroMQ http://www.zeromq.org/ . Circus是使用ZeroMQ http://www.zeromq.org/设计的。 See Design for more details. 有关详细信息,请参阅设计

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM