[英]How to properly run heavy calculations in Parallel using C#?
The goal is to compute all possible polyform shapes of a certain number of squares.目标是计算一定数量的正方形的所有可能的多边形形状。 Since this is very heavy computation for larger number I wanted to make use of the multiple cores that my computer has.由于对于更大的数字来说这是非常繁重的计算,所以我想利用我的计算机拥有的多核。
I made the problem easier to explain and test by creating the following scenario:我通过创建以下场景使问题更容易解释和测试:
1) for each value of 2, 3, 5, and 7:
2) find all multiples (up to a certain value) and add them to the same List
3) remove all duplicates from said list
In my final program step 2 is much more vast and computationally heavy, and thus I would prefer to split task two in however many values I want to check based on the values of step 1.在我的最终程序中,第 2 步的范围更大且计算量更大,因此我更愿意根据第 1 步的值将任务二拆分为我想要检查的许多值。
I made a winforms app with C# Core with 5 button trying different variations of parallelism that I found here on Stackoverflow and other places on the inte.net:我用 C# Core 和 5 个按钮制作了一个 winforms 应用程序,尝试了我在 Stackoverflow 和 inte.net 上的其他地方找到的不同并行性变体:
Here is the code (which looks like a lot, but it's just 5 variations of the same idea), they all give a count to check if they produced the same result + what time it took:这是代码(看起来很多,但它只是同一个想法的 5 个变体),它们都给出了一个计数来检查它们是否产生了相同的结果 + 花费了多少时间:
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Security.Permissions;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace Parallelism
{
public partial class Form1 : Form
{
private readonly int Repeat = 10000000;
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
var watch = System.Diagnostics.Stopwatch.StartNew();
List<int> output = new List<int>();
foreach (int x in new int[] { 2, 3, 5, 7 })
{
for (int i = 0; i < Repeat; i++)
{
output.Add(x * i);
}
}
output = output.Distinct().ToList();
watch.Stop();
(sender as Button).Text += $", c:{output.Count} - {watch.ElapsedMilliseconds}ms";
}
private void button2_Click(object sender, EventArgs e)
{
var watch = System.Diagnostics.Stopwatch.StartNew();
ConcurrentBag<int> output = new ConcurrentBag<int>();
Task task = Task.WhenAll(
Task.Run(() => button2_Calculation(2, output)),
Task.Run(() => button2_Calculation(3, output)),
Task.Run(() => button2_Calculation(5, output)),
Task.Run(() => button2_Calculation(7, output))
);
task.Wait();
HashSet<int> output2 = new HashSet<int>(output);
watch.Stop();
(sender as Button).Text += $", c:{output2.Count} - {watch.ElapsedMilliseconds}ms";
}
private void button2_Calculation(int x, ConcurrentBag<int> output)
{
for (int i = 0; i < Repeat; i++)
{
output.Add(x * i);
}
}
private void button3_Click(object sender, EventArgs e)
{
var watch = System.Diagnostics.Stopwatch.StartNew();
List<int> output = new List<int>();
foreach (int x in (new int[] { 2, 3, 5, 7 }).AsParallel())
{
for (int i = 0; i < Repeat; i++)
{
output.Add(x * i);
}
}
output = output.Distinct().ToList();
watch.Stop();
(sender as Button).Text += $", c:{output.Count} - {watch.ElapsedMilliseconds}ms";
}
private void button4_Click(object sender, EventArgs e)
{
var watch = System.Diagnostics.Stopwatch.StartNew();
ConcurrentBag<int> output = new ConcurrentBag<int>();
Dictionary<int, Task> runningTasks = new Dictionary<int, Task>();
foreach (int x in new int[] { 2, 3, 5, 7 })
{
int value = x;
runningTasks.Add(x, Task.Factory.StartNew(() => button2_Calculation(value, output)));
}
foreach (Task t in runningTasks.Select(c => c.Value))
t.Wait();
HashSet<int> output2 = new HashSet<int>(output);
watch.Stop();
(sender as Button).Text += $", c:{output2.Count} - {watch.ElapsedMilliseconds}ms";
}
private void button5_Click(object sender, EventArgs e)
{
var watch = System.Diagnostics.Stopwatch.StartNew();
ConcurrentBag<int> output = new ConcurrentBag<int>();
Parallel.ForEach(new int[] { 2, 3, 5, 7 }, x => button5_Calculation(x, output));
HashSet<int> output2 = new HashSet<int>(output);
watch.Stop();
(sender as Button).Text += $", c:{output2.Count} - {watch.ElapsedMilliseconds}ms";
}
private void button5_Calculation(int x, ConcurrentBag<int> output)
{
for (int i = 0; i < Repeat; i++)
output.Add(x * i);
}
}
}
So far all the above methods result in a similar duration between 1s - 1.5s.到目前为止,所有上述方法都会产生类似的持续时间,介于 1 秒到 1.5 秒之间。 Actually, sometimes the normal serial executions seems to be a lot faster.实际上,有时正常的串行执行似乎要快得多。 How is this possible?这怎么可能? I would expect that with 8 cores (16 virtual cores) that splitting the tasks would result in a faster overal speed?我希望使用 8 个内核(16 个虚拟内核)拆分任务会导致更快的整体速度?
Any help is very much appreciated!很感谢任何形式的帮助!
After learning more about how to properly implement parallelism I expect to also run the entirety of the calculations on another thread / Async to allow the GUI to remain responsive.在了解了有关如何正确实施并行性的更多信息后,我希望也能在另一个线程/异步上运行全部计算,以允许 GUI 保持响应。
Response to @Pac0: Here is my implementation of your suggestions.对@Pac0 的回应:这是我对你的建议的实施。 It doesn't seem to make much difference:它似乎没有太大区别:
private void button6_Click(object sender, EventArgs e)
{
var watch = System.Diagnostics.Stopwatch.StartNew();
ConcurrentBag<HashSet<int>> bag = new ConcurrentBag<HashSet<int>>();
var output = Parallel.ForEach(new int[] { 2, 3, 5, 7 }, x =>
{
HashSet<int> temp = new HashSet<int>();
for (int i = 0; i < Repeat; i++)
temp.Add(x * i);
bag.Add(temp);
});
HashSet<int> output2 = new HashSet<int>();
foreach (var hash in bag)
output2.UnionWith(hash);
watch.Stop();
(sender as Button).Text += $", c:{output2.Count} - {watch.ElapsedMilliseconds}ms";
}
As a comment mentioned your use of a single collection is causing significant locking.正如评论所提到的,您使用单个集合会导致大量锁定。 Computationally a task based solution is about 50% faster (see below where we don't manage a combined output).在计算上,基于任务的解决方案要快大约 50%(见下文我们不管理组合输出的地方)。 Its managing the collection that's causing some binding.它管理导致某些绑定的集合。 Depending on how its handled it can be upwards of 3 times slower than serial execution.根据其处理方式,它可能比串行执行慢 3 倍以上。
The struggle with concurrency is always balancing the load to the bottleneck.与并发的斗争总是平衡负载到瓶颈。
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
namespace ConsoleApp5
{
class Program
{
static int Repeat = 100000000;
static int[] worklist = new int[] { 2, 3, 5, 7 };
static void Main(string[] args)
{
var watch = System.Diagnostics.Stopwatch.StartNew();
Console.WriteLine("Hello World! Launching Threads");
Task launcher = Task.Run(()=>LaunchThreads());
launcher.Wait();
Console.WriteLine("Hello World! Threads Complete");
watch.Stop();
Console.WriteLine($"Threads took: {watch.ElapsedMilliseconds}");
watch = System.Diagnostics.Stopwatch.StartNew();
Console.WriteLine("Serial Execution Starting");
foreach (int i in worklist)
{
DoWork(i);
}
watch.Stop();
Console.WriteLine($"Serial Execution took: {watch.ElapsedMilliseconds}");
}
static async void LaunchThreads()
{
//Dictionary<int, List<int>> mywork = new Dictionary<int, List<int>>();
HashSet<int> output = new HashSet<int>();
var worktasks = new List<Task<List<int>>>();
foreach (int i in worklist)
{
worktasks.Add(Task.Run(() => DoWork(i)));
}
await Task.WhenAll(worktasks);
}
static List<int> DoWork(int x)
{
Console.WriteLine($"Thread Worker: {x}");
List<int> output = new List<int>();
for (int i = 0; i < Repeat; i++)
{
output.Add(x * i);
}
Console.WriteLine($"Thread Worker: {x} - Exiting");
return output;
}
}
}
I want to post this as an awnser because someone named Yugami posted something that was different from what I tried and it was a useful and good response, but it was deleted.我想把它作为一个 awnser 发布,因为一个叫 Yugami 的人发布了一些与我尝试的不同的东西,这是一个有用和好的回应,但它被删除了。
So I am posting my efforts at recreating their code in my test bench:所以我在我的测试台上发布了我在重新创建他们的代码方面所做的努力:
private async void button9_Click(object sender, EventArgs e)
{
var watch = System.Diagnostics.Stopwatch.StartNew();
HashSet<int> output = new HashSet<int>();
var worktasks = new List<Task<List<int>>>();
foreach (int i in new int[] { 2, 3, 5, 7 })
worktasks.Add(Task.Run(() => button9_Calculation(i)));
await Task.WhenAll(worktasks);
foreach (Task<List<int>> tsk in worktasks)
foreach (int i in tsk.Result)
output.Add(i);
watch.Stop();
(sender as Button).Text += $", c:{output.Count} - {watch.ElapsedMilliseconds}ms";
}
private List<int> button9_Calculation(int x)
{
List<int> output = new List<int>();
for (int i = 0; i < Repeat; i++)
output.Add(x * i);
return output;
}
Here are the results of the serial and best two solutions with 100.000.000 tries.以下是 100.000.000 次尝试的串行和最佳两个解决方案的结果。 Here I finally see some improvement of doing step 2 in parallel, but now the biggest bottleneck is removing the duplicates / filtering it all down to a single HashSet...在这里,我终于看到并行执行第 2 步的一些改进,但现在最大的瓶颈是删除重复项/将其全部过滤到单个 HashSet ...
So I think this solves the initial question that I had to improve step 2. Now I will continue my search to improve on Step 3;所以我认为这解决了我必须改进步骤 2 的最初问题。现在我将继续搜索以改进步骤 3; removing the duplicates.删除重复项。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.