简体   繁体   English

创建大量C#线程的最佳模式

[英]Best pattern for creating large number of C# threads

We are implementing a C# application that needs to make large numbers of socket connections to legacy systems. 我们正在实现一个C#应用程序,该应用程序需要与旧系统建立大量的套接字连接。 We will (likely) be using a 3rd party component to do the heavy lifting around terminal emulation and data scraping. 我们(可能)将使用第三方组件对终端仿真和数据抓取进行繁重的工作。 We have the core functionality working today, now we need to scale it up. 今天,我们的核心功能正在运行,现在我们需要对其进行扩展。

During peak times this may be thousands of concurrent connections - aka threads (and even tens of thousands several times a year) that need to be opened. 在高峰时期,可能需要打开数千个并发连接-aka线程(甚至每年数以万计的线程)。 These connections mainly sit idle (no traffic other than a periodic handshake) for minutes (or hours) until the legacy system 'fires an event' we care about, we then scrape some data from this event, perform some workflow, and then wait for the next event. 这些连接主要在几分钟(或几小时)内处于空闲状态(除了周期性的握手,没有流量),直到旧系统“触发我们关注的事件”,然后从该事件中抓取一些数据,执行一些工作流程,然后等待下一个事件。 There is no value in pooling (as far as we can tell) since threads will rarely need to be reused. 池化没有任何价值(据我们所知),因为很少需要重用线程。

We are looking for any good patterns or tools that will help use this many threads efficiently. 我们正在寻找任何有效的模式或工具来帮助有效地使用这么多线程。 Running on high-end server hardware is not an issue, but we do need to limit the application to just a few servers, if possible. 在高端服务器硬件上运行不是问题,但如果可能的话,我们确实需要将应用程序限制为仅几个服务器。

In our testing, creating a new thread, and init'ing the 3rd party control seems to use a lot of CPU initially, but then drops to near zero. 在我们的测试中,创建一个新线程并初始化第三方控件似乎最初使用了大量CPU,但随后下降到接近零。 Memory use seems to be about 800Megs / 1000 threads 内存使用量大约为800Megs / 1000个线程

Is there anything better / more efficient than just creating and starting the number of threads needed? 有没有比仅创建和启动所需线程数更好/更有效的方法?

PS - Yes we know it is bad to create this many threads, but since we have not control over the legacy applications, this seems to be our only alternative. PS-是的,我们知道创建这么多线程很不好,但是由于我们无法控制旧版应用程序,因此这似乎是我们唯一的选择。 There is not option for multiple events to come across a single socket / connection. 一个套接字/连接中没有多个事件的选项。

Thanks for any help or pointers! 感谢您的帮助或指点! Vans 万斯

You say this: 你这样说:

There is no value in pooling (as far as we can tell) since threads will rarely need to be reused. 池化没有任何价值(据我们所知),因为很少需要重用线程。

But then you say this: 但是然后你这样说:

Is there anything better / more efficient than just creating and starting the number of threads needed? 有没有比仅创建和启动所需线程数更好/更有效的方法?

Why the discrepancy? 为什么会有差异? Do you care about the number of threads you are creating or not? 您是否关心要创建的线程数? Thread pooling is the proper way to handle large numbers of mostly-idle connections. 线程池是处理大量大多数空闲连接的正确方法。 A few busy threads can handle many idle connections easily and with fewer resources required. 一些繁忙的线程可以轻松地处理许多空闲连接,并且所需资源更少。

Use the socket's asynchronous BeginReceive and BeginSend. 使用套接字的异步BeginReceive和BeginSend。 These dispatch the IO operation to the operating system and return immediately. 这些将IO操作分派给操作系统并立即返回。

You pass a delegate and some state to those methods that will be called when an IO operation completes. 您将委托和一些状态传递给将在IO操作完成时调用的那些方法。

Generally once you are done processing the IO then you immediately call BeginX again. 通常,一旦处理完IO,便立即再次调用BeginX。

Socket sock = GetSocket();
State state = new State() { Socket = sock, Buffer = new byte[1024], ThirdPartyControl = GetControl() };

sock.BeginReceive(state.Buffer, 0, state.Buffer.Length, 0, ProcessAsyncReceive, state);

void ProcessAsyncReceive(IAsyncResult iar)
{
    State state = iar.AsyncState as State;

    state.Socket.EndReceive(iar);

    // Process the received data in state.Buffer here
    state.ThirdPartyControl.ScrapeScreen(state.Buffer);

    state.Socket.BeginReceive(state.buffer, 0, state.Buffer.Length, 0, ProcessAsyncReceive, iar.AsyncState);
}

public class State
{
    public Socket Socket { get; set; }
    public byte[] Buffer { get; set; }
    public ThirdPartyControl { get; set; }
}

BeginSend is used in a similar fashion, as well as BeginAccept if you are accepting incoming connections. 如果您接受传入的连接,则BeginSend的使用方式也与此类似,也可以使用BeginAccept的方式。

With low throughput operations Async communications can easily handle thousands of clients simultaneously. 通过低吞吐量操作,异步通信可以轻松地同时处理数千个客户端。

I would really look into MPI.NET . 我真的会研究MPI.NET More Info MPI . 更多信息MPI MPI.NET also has some Parallel Reduction; MPI.NET还具有并行减少功能; so this will work well to aggregate results. 因此,这很适合汇总结果。

I would suggest utilizing the Socket.Select() method, and pooling the handling of multiple socket connections within a single thread. 我建议利用Socket.Select()方法,并在单个线程中集中处理多个套接字连接。

You could, for example, create a thread for every 50 connections to the legacy system. 例如,您可以为与旧系统的每50个连接创建一个线程。 These master threads would just keep calling Socket.Select() waiting for data to arrive. 这些主线程只会继续调用Socket.Select()等待数据到达。 Each of these master threads could then have a thread pool that sockets that have data are passed to for actual processing. 然后,这些主线程中的每个主线程都可以具有一个线程池,该线程池将具有数据的套接字传递到该线程池以进行实际处理。 Once the processing is complete, the thread could be passed back to the master thread. 一旦处理完成,线程可以被传递回主线程。

The are a number of patterns using Microsoft's Coordination and Concurrency Runtime that make dealing with IO easy and light. 使用Microsoft的“协调和并发运行时”的多种模式使处理IO变得轻而易举。 It allows us to grab and process well over 6000 web pages a minute (could go much higher, but there's no need) in a crawler we are developing. 它使我们能够在我们正在开发的搜寻器中每分钟抓取并处理超过6000个网页(可能会更高,但没有必要)。 Definitely worth a the time investment required to shift your head into the CCR way of doing things. 绝对值得花时间将您的头转向CCR处事方式。 There's a great article here: 这里有一篇很棒的文章:

http://msdn.microsoft.com/en-us/magazine/cc163556.aspx http://msdn.microsoft.com/en-us/magazine/cc163556.aspx

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM