简体   繁体   English

供料器应用程序的多线程体系结构

[英]Multi-threaded architecture for a feeder application

This is my first post here, so apologies if this isn't structured well. 这是我在这里的第一篇文章,如果结构不好,请您道歉。

We have been tasked to design a tool that will: 我们的任务是设计一种工具,该工具将:

  • Read a file (of account IDs), CSV format 读取CSV格式的文件(具有帐户ID)
  • Download the account data file from the web for each account (by Id) (REST API) 从网上为每个帐户下载帐户数据文件(按ID)(REST API)
  • Pass the file to a converter that will produce a report (financial predictions etc) [~20ms] 将文件传递到转换器,该转换器将生成报告(财务预测等)[〜20ms]
  • If the prediction threshold is within limits, run a parser to analyse the data [400ms] 如果预测阈值在限制范围内,请运行解析器以分析数据[400ms]
  • Generate a report for the analysis above [80ms] 为上述分析生成报告[80ms]
  • Upload all files generated to the web (REST API) 将所有生成的文件上传到Web(REST API)

Now all those individual points are relatively easy to do. 现在,所有这些单独的点都相对容易做到。 I'm interested in finding out how best to architect something to handle this and to do it fast & efficiently on our hardware. 我有兴趣了解如何最好地设计一些东西来处理这些问题,并在我们的硬件上快速高效地完成它。

We have to process roughly around 2 Million accounts. 我们必须处理大约200万个帐户。 The square brackets gives an idea of how long each process takes on average. 方括号表示每个过程平均需要花费多长时间。 I'd like to use the maximum resources available on the machine - 24 core Xeon processors. 我想使用机器上可用的最大资源-24核心Xeon处理器。 It's not a memory intensive process. 这不是一个占用大量内存的过程。

Would using TPL and creating each of these as a task be a good idea? 使用TPL并创建每个任务都是一个好主意吗? Each has to happen sequentially but many can be done at once. 每个步骤都必须顺序执行,但是许多操作可以一次完成。 Unfortunately the parsers are not multi-threading aware and we don't have the source (it's essentially a black box for us). 不幸的是,解析器不支持多线程,并且我们没有源(对于我们而言,这实际上是一个黑匣子)。

My thoughts were something like this - assumes we're using TPL: 我的想法是这样的-假设我们正在使用TPL:

  • Load account data (essentially a CSV import or SQL SELECT) 加载帐户数据(本质上是CSV导入或SQL SELECT)
  • For each Account (Id): 对于每个帐户(Id):
    • Download the data file for each account 下载每个帐户的数据文件
    • ContinueWith using the data file, send to the converter 继续使用数据文件,发送到转换器
    • ContinueWith check threshold, send to parser 继续检查阈值,发送到解析器
    • ContinueWith Generate Report 继续生成报告
    • ContinueWith Upload outputs 继续上载输出

Does that sound feasible or am I not understanding it correctly? 这听起来可行还是我不能正确理解? Would it be better to break down the steps a different way? 以其他方式分解步骤会更好吗?

I'm a bit unsure on how to handle issues with the parser throwing exceptions (it's very picky) or when we get failures uploading. 我不确定解析器抛出异常(非常挑剔)或上载失败时如何处理问题。

All this is going to be in a scheduled job that will run after-hours as a console application. 所有这些都将在计划的工作中进行,该工作将在下班后作为控制台应用程序运行。

I would think about using some kind of messagebus. 我会考虑使用某种消息总线。 So you can seperate the steps and if one wouldn't work (for example because the REST Service isn't accessible for some time) you can store the message for processing them later on. 因此,您可以将步骤分开,如果一个步骤不起作用(例如,由于REST服务一段时间无法访问),则可以存储消息以供以后处理。

Depending on what you use as a messagebus you can introduce threads with it. 根据用作消息总线的方式,可以引入线程。

In my opinion you could better design workflows, handle exceptional states and so on, if you have a more high level abstraction like a service bus. 在我看来,如果您具有更高层次的抽象(例如服务总线),则可以更好地设计工作流程,处理异常状态等。

Also beaucase the parts could run indepdently they don't block each other. 另外,因为零件可以相互独立运行,所以它们可以独立运行。

One easy way could be to use servicestack messaging with Redis ServiceBus. 一种简单的方法是将服务堆栈消息传递与Redis ServiceBus一起使用。

Some advantages quoted from there: 从中引用了一些优点:

  • Message-based design allows for easier parallelization and introspection of computations 基于消息的设计可简化计算的并行化和自省

  • DLQ messages can be introspected, fixed and later replayed after server updates and rejoin normal message workflow 可以对DLQ消息进行自省,修复,然后在服务器更新并重新加入正常的消息工作流之后重播

I think the easy way to start with multiple thread in your case, will be putting the entire operation for each account id in a thread (or better, in a ThreadPool ). 我认为在您的情况下,从多个线程开始的简单方法是将每个帐户ID的整个操作放在一个线程中 (或者更好的是在ThreadPool中 )。 In the proposed way below, I think you will not need to control inter-thread operations. 按照下面建议的方式,我认为您将不需要控制线程间操作。

Something like this to put the data on the thread pool queue : 像这样将数据放在线程池队列中

var accountIds = new List<int>();
foreach (var accountId in accountIds)
{
    ThreadPool.QueueUserWorkItem(ProcessAccount, accountId);
}

And this is the function you will process each account: 这是您将处理每个帐户的功能:

public static void ProcessAccount(object accountId)
{
    // Download the data file for this account
    // ContinueWith using the data file, send to the converter
    // ContinueWith check threshold, send to parser
    // ContinueWith Generate Report
    // ContinueWith Upload outputs
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM