简体   繁体   English

如何将工作分配给一组计算机

[英]How to distribute work to a pool of computers

I have some data that needs to be processed.我有一些数据需要处理。 The data is a tree.数据是一棵树。 The processing goes like this: Take a node N. Check if all of its children have already been processed.处理过程如下:取一个节点 N。检查它的所有子节点是否都已处理。 If not, process them first.如果没有,请先处理它们。 If yes, process N. So we go from top to bottom (recursively) to the leaves, then process leaves, then the leaves' parent nodes and so on, upwards until we arrive at the root again.如果是,处理N。所以我们go从上到下(递归)到叶子,然后处理叶子,然后是叶子的父节点等等,直到我们再次到达根。

I know how to write a program that runs on ONE computer that takes the data (ie the root node) and processes it as described above.我知道如何编写一个在一台计算机上运行的程序,该程序获取数据(即根节点)并如上所述进行处理。 Here is a sketch in C#:这是 C# 中的草图:

// We assume data is already there, so I do not provide constructor/setters.
public class Data
{
    public object OwnData { get; }
    public IList<Data> Children { get; }
}

// The main class. We just need to call Process once and wait for it to finish.
public class DataManager
{
    internal ISet<Data> ProcessedData { get; init; }
    
    public DataManager()
    {
        ProcessedData = new HashSet<Data>();
    }
    
    public void Process(Data rootData)
    {
        new DataHandler(this).Process(rootData);
    }
}

// The handler class that processes data recursively by spawning new instances.
// It informs the manager about data processed.
internal class DataHandler
{
    private readonly DataManager Manager;
    
    internal DataHandler(ProcessManager manager) 
    {
        Manager = manager;
    }
    
    internal void Process(Data data)
    {
        if (Manager.ProcessedData.Contains(data))
            return;
            
        foreach (var subData in data.Children)
            new DataHandler(Manager).Process(subData);
            
        ... // do some processing of OwnData
        
        Manager.ProcessedData.Add(data);
    }
}

But how can I write the program so that I can distribute the work to a pool of computers (that are all in the same.network, either some local one or the inte.net)?但是我如何编写程序才能将工作分配给一组计算机(它们都在同一个网络中,本地计算机或 inte.net)? What do I need to do for that?我需要为此做什么?

Some thoughts/ideas:一些想法/想法:

  1. The DataManager should run on one computer (the main one / the sever?); DataManager应该在一台计算机上运行(主计算机/服务器?); the DataHandlers should run on all the others (the clients?). DataHandlers 应该在所有其他(客户端?)上运行。
  2. The DataManager needs to know the computers by some id (what id would that be?) which are set during construction of DataManager . DataManager需要通过在构造DataManager期间设置的一些 id(那是什么 id?)来了解计算机。
  3. The DataManager must be able to create new instances of DataHandler (or kill them if something goes wrong) on these computers. DataManager必须能够在这些计算机上创建DataHandler的新实例(或者在出现问题时终止它们)。 How?如何?
  4. The DataManager must know which computers currently have a running instance of DataHandler and which not, so that it can decide on which computer it can spawn the next DataHandler (or, if none is free, wait). DataManager必须知道哪些计算机当前有正在运行的DataHandler实例,哪些没有,以便它可以决定在哪台计算机上生成下一个DataHandler (或者,如果没有可用的,则等待)。

These are not requirements.这些不是要求。 I do not know if these ideas are viable.我不知道这些想法是否可行。

In the above thoughts I assumed that each computer can just have one instance of DataHandler .在上面的想法中,我假设每台计算机只能有一个DataHandler实例。 I know this is not necessarily so (because CPU cores and threads...), but in my use case it might actually be that way: The real DataManager and DataHandler are not standalone but run in a SolidWorks context.我知道不一定如此(因为 CPU 核心和线程......),但在我的用例中它实际上可能是这样的:真正的DataManagerDataHandler不是独立的,而是在 SolidWorks 上下文中运行。 So in order to run any of that code, I need to have a running SolidWorks instance.因此,为了运行任何这些代码,我需要有一个正在运行的 SolidWorks 实例。 From my experience, more than one SolidWorks instance on the same Windows does not work (reliably).根据我的经验,同一 Windows 上的多个 SolidWorks 实例无法(可靠地)工作。

From my half-knowledge it looks like what I need is a kind of multi-computer-OS: In a single-computer-setting, the points 2, 3 and 4 are usually taken care of by the OS.从我一知半解看来,我需要的是一种多计算机操作系统:在单计算机设置中,操作系统通常会处理第 2、3 和 4 点。 And point 1 kind of is the OS (the OS= DataManager spawns processes= DataHandlers ; the OS keeps track of data= ProcessedData and the processes report back).第一种是操作系统(操作系统= DataManager产生进程= DataHandlers ;操作系统跟踪数据= ProcessedData并且进程报告回来)。


What exactly do I want to know?我到底想知道什么?

  • Hints to words, phrases or introductory articles that allow me to dive into the topic (in order to become able to implement this).对单词、短语或介绍性文章的提示,使我能够深入探讨该主题(以便能够实现这一点)。 Possibly language-agnostic.可能是 language-agnostic。
  • Hints to C# libraries/frameworks that are fit for this situation.提示 C# 适合这种情况的库/框架。
  • Tips on what I should or shouldn't do (typical beginners issues).关于我应该做什么或不应该做什么的提示(典型的初学者问题)。 Possibly language-agnostic.可能是 language-agnostic。
  • Links to example/demonstration C# projects, eg on GitHub. (If not C#, VB is also alright.)示例/演示 C# 项目的链接,例如 GitHub。(如果不是 C#,VB 也可以。)

You should read up on microservices and queues.您应该阅读微服务和队列。 Like rabbitmq. The producer/ consumer approach.像 rabbitmq。生产者/消费者方法。

https://www.rabbitmq.com/getstarted.html https://www.rabbitmq.com/getstarted.html

If you integrate your microservices with Docker, you can do some pretty nifty stuff.如果您将微服务与 Docker 集成,您可以做一些非常漂亮的事情。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM