简体   繁体   English

C#多线程对表的列性能问题

[英]C# Multi-Threading on tables' columns performance issue

I am currently having an issue with getting the performance benefit using C# Threads. 我目前遇到使用C#Threads获得性能优势的问题。 What I am currently doing here is encrypting the entire contents of selected columns in a tabular file (.csv). 我目前在这里做的是加密表格文件(.csv)中所选列的全部内容。 The program is used for generally large files, whose size may reach up to terabytes in size, with millions of rows and numerous columns. 该程序通常用于大型文件,其大小可达到数TB,具有数百万行和多列。

In order to achieve optimal performance, I planned to create and run the processing algorithm in separate threads for each column. 为了获得最佳性能,我计划在每个列的单独线程中创建和运行处理算法。 I believe the sheer amount of computation needed for each column warrants a thread each. 我相信每列所需的大量计算都需要保证一个线程。 Or at least it was true when I was doing similar projects using C++ threads . 或者至少在我使用C ++线程进行类似项目时是如此。

Now, for some reason, the code snippet below DOES NOT produce any speed-up what so ever. 现在,出于某种原因,下面的代码片段并没有产生任何加速。 In fact, the time taken compared to single thread sequential processing differs in only a few seconds SLOWER . 事实上,相比于单个线程顺序处理仅在几秒钟较慢的不同的时间。 It is roughly the same result whether I'm processing only 1 column, 4 columns, or 128 columns simultaneously. 无论我是同时处理1列,4列还是128列,结果大致相同。

// NOTE:
// m_TableData is of type |--Dictionary<int, List<string>>--|
// Key   == Column Number
// Value == Column Contents

List<Thread> Threads = new List<Thread>();

// encrypt data in selected columns
foreach (var KeyPair in m_TableData)
{
  Threads.Add(new Thread(new ThreadStart(() =>
  {
    // Process each row element
    // NOTE: ColSize is usually huge (>10,000)
    int ColSize = KeyPair.Value.Count();
    for (int i = 0; i < ColSize; ++i)
    {
      m_TableData[KeyPair.Key][i] = ProcessingAlgorithm(m_TableData[KeyPair.Key][i]);
    }
  })));
  Threads.Last().Start();
}

foreach (Thread th in Threads)
  th.Join();

In hopes of trying to prevent False Sharing and/or better cache performance I have even tried splitting up the rows into batches of 500, 1000, 10000, etc, but to no avail. 为了试图防止虚假共享和/或更好的缓存性能,我甚至尝试将行分成500,1000,10000等批次,但无济于事。 I have even tried the System.Threading.Tasks.Parallel.ForEach function, and it also seems to give the same results. 我甚至尝试过System.Threading.Tasks.Parallel.ForEach函数,它似乎也给出了相同的结果。 The lack of performance improvement has me really scratching my head. 缺乏性能提升让我感到非常震惊。

Any professional/experienced parallel-processing programmers over here? 这里有专业/经验丰富的并行处理程序员吗? I greatly appreciate any and all feedback and criticism on my code and issue. 我非常感谢我的代码和问题的任何反馈和批评。 Thanks! 谢谢!

Pretty sure this isn't your biggest performance issue - something else is going on here (really need to see what your ProcessingAlgorithm is doing) - but you can eliminate 2 dictionary lookups in your loop by replacing: 很确定这不是你最大的性能问题 - 这里还有其他事情(真的需要看看你的ProcessingAlgorithm正在做什么) - 但你可以通过替换来消除循环中的2个字典查找:

m_TableData[KeyPair.Key]

with

KeyPair.Value

So you end up with: 所以你最终得到:

for (int i = 0; i < ColSize; ++i)
{
    KeyPair.Value[i] = ProcessingAlgorithm(KeyPair.Value[i]);
}

(I'd probably actually assign the list to a local variable instead of using the KeyPair) (我可能实际上将列表分配给局部变量而不是使用KeyPair)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM