简体   繁体   English

C# 多线程foreach循环

[英]C# multi-threaded foreach loop

I've recently started working on multi-threading calls with C# and I'm unsure whether it's correct or not.我最近开始使用 C# 进行多线程调用,但我不确定它是否正确。

How can I make this go faster?我怎样才能让这个 go 更快? I'm guessing it's with Parallelism, but I have not been successful in integrating that concept into this.我猜它与并行性有关,但我还没有成功地将这个概念整合到其中。

Edits编辑

Please note this is running in a distant VM and it's a console program;请注意,这是在远程 VM 中运行的,它是一个控制台程序; meaning user experience is not an issue.意味着用户体验不是问题。 I just want this to run fast, since number of links may go up to 200k elements and we want results as soon as possible.我只是希望它运行得快,因为链接数可能 go 多达 200k 个元素,我们希望尽快得到结果。 I also removed all questions but one, since it's the one I would like some help.我还删除了除一个以外的所有问题,因为这是我需要帮助的问题。

Here is my code which seems to work:这是我的代码,它似乎有效:

// Use of my results
public void Main() 
{
  var results = ValidateInternalLinks();
  // Writes results to txt file
  WriteResults(results.Result, "Internal Links");
}

// Validation of data
public async Task<List<InternalLinksModel>> ValidateInternalLinks() 
{
  var tasks = new List<Task>();
  var InternalLinks = new List<InternalLinksModel>();
  // Populate InternalLinks with the data

  foreach (var internalLink in InternalLinks)
  {
    tasks.Add(GetResults(internalLink));
  }

  await Task.WhenAll(tasks);

  return InternalLinks;
}

// Get Results for each piece of data
public async Task GetResults(InternalLinksModel internalLink)
{ 
  var response = await SearchValue(internalLink.SearchValue);

// Analyse response and change result (possible values: SUCCESS, FAILED, [])
  internalLink.PossibleResults = ValidateSearchResult(response);
}

// Http Request
public async Task<ResponseModel> SearchValue(string value) 
{
  // RestSharp API creation and headers addition
  var response = await client.ExecuteTaskAsync(request);

  return JsonConvert.DeserializeObject<ResponseModel>(response.Content);
}

It seems that you have a series of I/O bound and CPU bound jobs that you need to do the one after the other, with varying degree of concurrency required for each step.似乎您有一系列 I/O 密集型和 CPU 密集型作业,您需要一个接一个地执行,每个步骤都需要不同程度的并发性。 A good tool for dealing with that kind of workloads is the TPL Dataflow library .处理此类工作负载的一个好工具是TPL 数据流库 This library is designed in a way that allows forming pipelines (or even complex networks) of data that flows from one block to the next.该库的设计方式允许形成从一个块流向下一个块的数据管道(甚至复杂网络)。 I tried to come up with an example that demonstrates using this library, and then realized that your workflow includes a last step where a property must be updated ( internalLink.PossibleResults ) that belongs to the first type of item entering the pipeline.我试图提出一个示例来演示如何使用此库,然后意识到您的工作流程包括最后一个步骤,其中必须更新属于进入管道的第一种类型的项目的属性 ( internalLink.PossibleResults )。 This complicates things quite a lot, because it implies that the first type must be carried along all the steps of the pipeline.这使事情变得相当复杂,因为它意味着第一种类型必须沿着管道的所有步骤进行。 The easiest way to do this would probably be to use ValueTuple s as input and output of the blocks.最简单的方法可能是使用ValueTuple作为输入和块的 output。 This would make my example too messy though, so I preferred to keep it in its simplest form, since its purpose is mainly to demonstrate the capabilities of the TPL Dataflow library:不过,这会使我的示例过于混乱,因此我更愿意将其保留为最简单的形式,因为它的目的主要是展示 TPL Dataflow 库的功能:

var cts = new CancellationTokenSource();
var restClient = new RestClient();

var block1 = new TransformBlock<InternalLinksModel, RestResponse>(async item =>
{
    return await restClient.ExecuteTaskAsync(item);
}, new ExecutionDataflowBlockOptions()
{
    MaxDegreeOfParallelism = 10, // 10 concurrent REST requests max
    CancellationToken = cts.Token, // Cancel at any time
});

var block2 = new TransformBlock<RestResponse, ResponseModel>(item =>
{
    return JsonConvert.DeserializeObject<ResponseModel>(item.Content);
}, new ExecutionDataflowBlockOptions()
{
    MaxDegreeOfParallelism = 2, // 2 threads max for this CPU bound job
    CancellationToken = cts.Token, // Cancel at any time
});

var block3 = new TransformBlock<ResponseModel, string>(async item =>
{
    return await SearchValue(item);
}, new ExecutionDataflowBlockOptions()
{
    MaxDegreeOfParallelism = 10, // Concurrency 10 for this I/O bound job
    CancellationToken = cts.Token, // Cancel at any time
});

var block4 = new ActionBlock<string>(item =>
{
    ValidateSearchResult(item);
}, new ExecutionDataflowBlockOptions()
{
    MaxDegreeOfParallelism = 1, // 1 thread max for this CPU bound job
    CancellationToken = cts.Token, // Cancel at any time
});

block1.LinkTo(block2, new DataflowLinkOptions() { PropagateCompletion = true });
block2.LinkTo(block3, new DataflowLinkOptions() { PropagateCompletion = true });
block3.LinkTo(block4, new DataflowLinkOptions() { PropagateCompletion = true });

var internalLinks = new List<InternalLinksModel>();
// Populate internalLinks with the data
foreach (var internalLink in internalLinks)
{
    await block1.SendAsync(internalLink);
}
block1.Complete();

await block4.Completion;

Two types of blocks are used in this example, TransformBlock and ActionBlock .此示例中使用了两种类型的块, TransformBlockActionBlock An ActionBlock is usually the last block of a pipeline, since it doesn't produce any output. ActionBlock通常是管道的最后一个块,因为它不会产生任何 output。 In case your workload is too granular, and the overhead of passing the objects around is comparable with the workload itself, you could start the pipeline with a BatchBlock , and then process the next steps in batches of, say, 10 elements each.如果您的工作负载过于细化,并且传递对象的开销与工作负载本身相当,您可以使用BatchBlock启动管道,然后分批处理后续步骤,例如每个 10 个元素。 It doesn't seem that this is required in your case though, since making web requests and parsing JSON responses are pretty bulky jobs.不过,您的情况似乎不需要这样做,因为发出 web 请求和解析 JSON 响应是相当庞大的工作。

async/await/WhenAll is the correct way to go, your performance bottleneck is likely I/O bound (HTTP requests) not compute bound. async/await/WhenAll 是 go 的正确方法,您的性能瓶颈可能是 I/O 限制(HTTP 请求)而不是计算限制。 Asynchrony is appropriate tool to handle this.异步是处理这个问题的合适工具。 How many HTTP requests are you making and are they all to the same server?您发出多少 HTTP 请求,它们都发往同一台服务器吗? If so, you may be hitting a connection limit.如果是这样,您可能会遇到连接限制。 I'm not very familiar with RestSharp, but you might try increasing the connection limit via ServicePointManager.我对 RestSharp 不是很熟悉,但您可以尝试通过 ServicePointManager 增加连接限制。 The more outstanding requests you have, assuming the server can handle them, the faster the WhenAll will complete.您拥有的未完成请求越多,假设服务器可以处理它们,WhenAll 完成的速度就越快。

https://docs.microsoft.com/en-us/dotnet/api/system.net.servicepointmanager?view=netframework-4.8 https://docs.microsoft.com/en-us/dotnet/api/system.net.servicepointmanager?view=netframework-4.8

All of that said, I would reorganize your code.综上所述,我会重新组织你的代码。 Use Task/WhenAll for your HTTP requests.对您的 HTTP 请求使用 Task/WhenAll。 And process the responses after the WhenAll completes.并在WhenAll 完成后处理响应。 If you do this you can determine with certainty if the HTTP requests are where the bottleneck is, by setting a breakpoint after the WhenAll observing the execution times.如果这样做,您可以确定 HTTP 请求是否是瓶颈所在,方法是在 WhenAll 观察执行时间之后设置断点。 If you can't debug, you can log the execution time.如果无法调试,可以记录执行时间。 This should give you an idea if the bottleneck is primarily network I/O.这应该让您了解瓶颈是否主要是网络 I/O。 I'm pretty confident it is.我很有信心是的。

If it turns out that there is a compute bottleneck, you can use a Parallel.ForEach loop to deserialize, validate, and assign.如果事实证明存在计算瓶颈,您可以使用 Parallel.ForEach 循环来反序列化、验证和分配。

            var internalLinks = new List<InternalLinksModel>();
            // Populate InternalLinks with the data
            // I'm assuming this means internalLinks is assumed to contain data. If not I'm not sure I understand your code.
            var dictionary = new Dictionary<Task, InternalLinksModel>(); //You shouldn't need a concurrent dictionary since you'll only be doing reads in parallel.

            //make api calls - I/O bound
            foreach (var l in internalLinks)
            {
                dictionary[client.ExecuteTaskAsync(l.SearchValue)] = l;
            }

            await Task.WhenAll(dictionary.Keys);    
            // I/O is done.

            // Compute bound - deserialize, validate, assign.
            Parallel.ForEach(dictionary.Keys, (task) =>
            {
                var responseModel = JsonConvert.DeserializeObject<ResponseModel>(task.Result.Content);
                dictionary[task].PossibleResults = ValidateSearchResult(responseModel);
            });


            // Writes results to txt file
            WriteResults(dictionary.Values, "Internal Links");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM