简体   繁体   English

改善并行性C#Linq

[英]Improve parallelism c# linq

I've developed a simple data migration console tool. 我已经开发了一个简单的数据迁移控制台工具。 using C#, Linq and EF. 使用C#,Linq和EF。 This tool get all data I want to move from place A to B. The code is something like this: 这个工具获取我要从位置A移到位置B的所有数据。代码是这样的:

var data = dataAccess.GetData()
 Parallel.ForEach(data, currentdata =>
 { 
   //Do some business, and insert data
 });

As I know parallel foreach handles everythig in order to take the advantage of parallelism using all cores of the procesor and threads as possible in the most profitable way. 据我所知,并行foreach可以处理每一个对象,以便以最有利可图的方式利用处理器和线程的所有内核来利用并行性。

So I tried this tool with a huge amount of data, and migration process takes about 5h. 因此,我尝试使用此工具处理大量数据,并且迁移过程大约需要5个小时。

Then, I decided to try other idea. 然后,我决定尝试其他想法。

I've generated 4 consoles.exe of this proyect, making a modification, now each one takes a quarter of data. 我已经生成了该对象的4个consoles.exe,并进行了修改,现在每个对象都需要四分之一的数据。

Eg: Total data is about 40 millions registers to migrate, console 1 migrates from 0 to 10M, console 2 from 10M to 20M, console 3 from 20M to 30M and console 4 from 30M to 40M Then I runned this consoles, one on each core of my processor, and guess what, it takes less than a half to migrate everything. 例如:要迁移的总数据约为4000万个寄存器,控制台1从0迁移到10M,控制台2从10M迁移到20M,控制台3从20M迁移到30M,控制台4从30M迁移到40M,然后我运行了这个控制台,每个核心一个我的处理器,然后猜测,迁移所有内容所需的时间不到一半。

How could it be possible if supposedly parallel foreach should be the best aproach? 如果据说并行的foreach应该是最好的方法,怎么可能呢?

Any idea to replicate this improvement just with one console? 有什么想法可以仅使用一个控制台来复制这种改进吗?

Thank you. 谢谢。

EDIT: Now I'm trying this, previously I chunked the data: 编辑:现在我正在尝试此操作,以前我对数据进行了分块:

Process process = Process.GetCurrentProcess();
            int cpuCount = Environment.ProcessorCount;
            int offset = process.Threads.Count;
            Thread[] threads = new Thread[cpuCount];
            for (int i = 0; i < cpuCount; ++i)
            {
                Thread t = new Thread(new ThreadStart( migrateChunk))
                { IsBackground = true };
                t.Start();
            }

            process.Refresh();
            for (int i = 0; i < cpuCount; ++i)
            {
                process.Threads[i + offset].ProcessorAffinity = (IntPtr)(i+1);
            }

Do you think is a good approach? 您认为这是个好方法吗? Because I dont see any improvement from parallel Foreach. 因为我看不到并行的Foreach有任何改进。 Even I tried to attach all proccesses to the same core but dont see any change. 即使我尝试将所有过程都附加到同一核心,但看不到任何变化。 thanks 谢谢

The issue is on 问题在

var data = dataAccess.GetData()

let said to retrieve the 40 mil of data is take 4 mins, and retrieve the 10 mil of data is take 1 mins, so the 10 mil console apps start moving the data when the 40 mil still retrieving the data from the database. 假设检索4000万数据需要4分钟,而检索1000万数据则需要1分钟,所以当4000万仍从数据库中检索数据时,1000万控制台应用程序开始移动数据。

 Parallel.ForEach(data, currentdata => { //Do some business, and insert data });

for this part, you might want to check out the Parallel documentation, basically, the parallel will get the data and divide into small chunk and spread the data to the processors to process it. 对于这一部分,您可能需要查看Parallel文档,基本上,并行将获取数据并分成小块,然后将数据分发给处理器进行处理。
1. Parallel 1. 平行
2. Parallel Partition of Work 2. 并行分配工作

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM