简体   繁体   English

在循环创建的子线程中访问主线程变量时出现意外结果

[英]Unexpected result when accessing main thread variable in loop created child threads

I am writing a program that reads a large amount of dynamic data.我正在编写一个读取大量动态数据的程序。 I would like to use multi-threading to speed this process up.我想使用多线程来加快这个过程。 I understand that it is up to the operating system on how to handle created threads (thread priority, etc), which I believe is the reason for my unexpected results, however, I still do not know a solution.我知道如何处理创建的线程(线程优先级等)取决于操作系统,我相信这是我意外结果的原因,但是,我仍然不知道解决方案。

Code:代码:

int multiplier = rowCount / 20;

Debug.WriteLine("Row count is " + rowCount + "...");
Debug.WriteLine("Using 20 threads to complete job...");
Debug.WriteLine("Using " + multiplier + " as multiplier...");

for (int i = 1; i <= 20; i++) {
    new Thread(() => {
        int startRow = ((multiplier * i) - multiplier) + 1;
        int endRow = multiplier * i;
        
        if (i == 20) {
            endRow = rowCount;
        }
    
        Debug.WriteLine("    [THREAD " + i + "] start row: " + startRow + ", end row : " + endRow);

        for (int row = startRow; row <= endRow; row++) {
            for (int column = 1; column <= columnCount; column++) {
                //...data is read here
            }
        }
    }).Start();
}

Actual result (it appears that my issue is the child thread not "reading" the 'i' variable "correctly", which makes sense considering how threads work, I just do not know how to fix it):实际结果(看来我的问题是子线程没有“正确”“读取”'i'变量,考虑到线程如何工作,这是有道理的,我只是不知道如何解决它):

Row count is 2209...
Using 20 threads to complete job...
Using 110 as multiplier...
    [THREAD 4] start row: 331, end row : 440
    [THREAD 4] start row: 331, end row : 440
    [THREAD 5] start row: 441, end row : 550
    [THREAD 5] start row: 441, end row : 550
    [THREAD 6] start row: 551, end row : 660
    [THREAD 7] start row: 661, end row : 770
    [THREAD 9] start row: 881, end row : 990
    [THREAD 9] start row: 881, end row : 990
    [THREAD 11] start row: 1101, end row : 1210
    [THREAD 11] start row: 1101, end row : 1210
    [THREAD 12] start row: 1211, end row : 1320
    [THREAD 14] start row: 1431, end row : 1540
    [THREAD 15] start row: 1541, end row : 1650
    [THREAD 15] start row: 1541, end row : 1650
    [THREAD 16] start row: 1651, end row : 1760
    [THREAD 17] start row: 1761, end row : 1870
    [THREAD 19] start row: 1981, end row : 2090
    [THREAD 20] start row: 2091, end row : 2209
    [THREAD 20] start row: 2091, end row : 2209
    [THREAD 21] start row: 2201, end row : 2310

Expected result (this is the result of simply not using threads, meaning commenting out the lambda expression):预期结果(这是根本不使用线程的结果,即注释掉 lambda 表达式):

Row count is 2209...
Using 20 threads to complete job...
Using 110 as multiplier...
    [THREAD 1] start row: 1, end row : 110
    [THREAD 2] start row: 111, end row : 220
    [THREAD 3] start row: 221, end row : 330
    [THREAD 4] start row: 331, end row : 440
    [THREAD 5] start row: 441, end row : 550
    [THREAD 6] start row: 551, end row : 660
    [THREAD 7] start row: 661, end row : 770
    [THREAD 8] start row: 771, end row : 880
    [THREAD 9] start row: 881, end row : 990
    [THREAD 10] start row: 991, end row : 1100
    [THREAD 11] start row: 1101, end row : 1210
    [THREAD 12] start row: 1211, end row : 1320
    [THREAD 13] start row: 1321, end row : 1430
    [THREAD 14] start row: 1431, end row : 1540
    [THREAD 15] start row: 1541, end row : 1650
    [THREAD 16] start row: 1651, end row : 1760
    [THREAD 17] start row: 1761, end row : 1870
    [THREAD 18] start row: 1871, end row : 1980
    [THREAD 19] start row: 1981, end row : 2090
    [THREAD 20] start row: 2091, end row : 2209

Try moving the variables that can be calculated outside of the thread to avoid reading the shared variable i in the thread.尝试将可以计算的变量移到线程外,避免读取线程中的共享变量i The threads are started without care of the surrounding loop which increments i .线程在不关心增加i的周围循环的情况下启动。

int multiplier = rowCount / 20;

Debug.WriteLine("Row count is " + rowCount + "...");
Debug.WriteLine("Using 20 threads to complete job...");
Debug.WriteLine("Using " + multiplier + " as multiplier...");

for (int i = 1; i <= 20; i++) {
    int startRow = ((multiplier * i) - multiplier) + 1;
    int endRow = multiplier * i;
    if (i == 20) {
       endRow = rowCount;
    }
    int nThread = i;
    new Thread(() => {
        Debug.WriteLine("    [THREAD " + nThread + "] start row: " + startRow + ", end row : " + endRow);

        for (int row = startRow; row <= endRow; row++) {
            for (int column = 1; column <= columnCount; column++) {
                //...data is read here
            }
        }
    }).Start();
}

It is apparent in the first log you shared, for example when thread 1 and 2 starts, i is already at 4.在您共享的第一个日志中很明显,例如当线程 1 和 2 启动时, i已经是 4。

When you reference a variable defined outside of the scope of your lambda function compiler creates so called closure ie your lambda function effectively uses a reference to the variable, not the value of the variable it had when the lambda was created. When you reference a variable defined outside of the scope of your lambda function compiler creates so called closure ie your lambda function effectively uses a reference to the variable, not the value of the variable it had when the lambda was created.
In your example each thread has a reference to a variable int i and not the value of this variable at the moment when each thread was created.在您的示例中,每个线程都有一个对变量int i的引用,而不是在创建每个线程时该变量的值 In fact this peculiar behaviour can be demonstrated even without using threads.事实上,即使不使用线程也可以证明这种特殊行为。 Using slightly modified example from Eric Lippert blog :使用Eric Lippert 博客中稍作修改的示例:

var funcs = new List<Func<int>>();
for(int i = 0; i < 10; i++)
{
    funcs.Add(() => i);    
}

foreach(var f in funcs)
{
    Console.WriteLine(f());
}

This code, perhaps surprisingly for some, will not output numbers from 0 to 9 but will output 10 10 's.对于某些人来说,可能令人惊讶的是,此代码不会 output 数字从09 ,而是 output 10 10
In fact before C# 5 foreach loop had exactly the same behaviour (as demonstrated by the original example in the above blog post) but apparently this behaviour was so surprising for many people that in C# 5 it was changed so now foreach variable is logically inside the loop, and therefore closure is closed over a fresh copy of the variable each time.事实上,在 C# 5 之前, foreach循环具有完全相同的行为(如上述博客文章中的原始示例所示),但显然这种行为对许多人来说是如此令人惊讶,以至于在 C# 5 中它已被更改,因此现在foreach变量在逻辑上位于循环,因此每次都会在变量的新副本上关闭闭包。 Which is not the case for for loop and in fact for any variable defined outside lambda scope, ie the variables lambda is closed over.对于for循环,实际上对于在 lambda scope 之外定义的任何变量都不是这种情况,即变量 lambda 已关闭。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM