简体   繁体   English

Parallel.For有不寻常的行为

[英]Parallel.For having unusual behaviour

I'm trying to transfer this prime sieving function to use Parallel.For so it can utilize multiple cores. 我正在尝试将此主要筛分功能转移到使用Parallel.For,因此它可以使用多个核心。

However, when I run it, the value of the b variable seems to randomly jump or even not change at all, especially for higher values of To. 但是,当我运行它时,b变量的值似乎随机跳转甚至根本不会改变,尤其是对于更高的To值。

static List<long> Sieve(long To)
{
    long f = 0;
    To /= 2;

    List<long> Primes = new List<long>();
    bool[] Trues = new bool[To];

    if (To > 0)
        Primes.Add(2);

    long b = 0;

    Parallel.For(1L, To, a =>
    {
        b++;

        if (Trues[b])
            return;

        f = 2 * b + 1;
        Primes.Add(f);

        for (long j = f + b; j < To; j += f)
            Trues[j] = true;
    });

    return Primes;
}

What's going on, and how can I stop that from happening? 发生了什么,我怎么能阻止这种情况发生?

The problem you are facing here is called race conditions , it's what happens when multiple CPU cores load the same variable into their respective cache, work on it, then write the value back to RAM. 您在这里面临的问题称为race conditions ,当多个CPU核心将相同的变量加载到各自的高速缓存中时,会发生这种情况,对其进行处理,然后将值写回RAM。 Obviously, the value that's written back to RAM may already be old in the meantime (like when a core loads the variable right before it's overwritten with another value) 显然,写回RAM的值可能在此期间已经过时(例如,当核心在被另一个值覆盖之前加载变量时)

First of all: I wouldn't use b++ but int i = Interlocked.Increment(ref b); 首先:我不会使用b++但是int i = Interlocked.Increment(ref b); instead. 代替。 Interlocked.Increment ensures that no 2 threads attempt to increment the same value at the same time. Interlocked.Increment确保没有2个线程尝试同时递增相同的值。 The result is the incremented value which will be saved into the variable i . 结果是递增的值,它将保存到变量i This is very important, because you will need that value to remain constant for every iteration of your for-loop, which would be impossible otherwise, since other threads will be incrementing this variable. 非常重要,因为您需要该值在for循环的每次迭代中保持不变,否则这是不可能的,因为其他线程将递增此变量。

Next is your variable f and a (defined as the For-iterator). 接下来是变量fa (定义为For-iterator)。 Forget f , use a instead. 忘记f ,用a来代替。

f = 2 * b + 1; // wrong
a = 2 * b + 1; // correct

Lastly: System.Collections.Generic.List is NOT , I repeat (because it's important) NOT thread safe. 最后:System.Collections.Generic.List 不是 ,我重复(因为它很重要) 不是线程安全的。 See http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx for more details. 有关详细信息,请参阅http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx

Primes.Add(f); // will likely break something
lock (Primes)  // LOCK the list
{
    Primes.Add(a); // don't forget, we're using 'a' instead of 'f' now
}

The lock keyword accepts only reference-type variables as argument, that is because locking a variable does NOT prevent another thread from accessing it. 所述lock关键字只接受参考类型的变量作为参数,这是因为锁定变量防止其它线程进行存取。 Instead, you can imagine it as setting a flag on top of the reference, in order to signal other threads I'm working here, please do not disturb! 相反,您可以将其想象为在引用顶部设置标志,以便发信号通知I'm working here, please do not disturb!其他线程I'm working here, please do not disturb!

Of course, if another thread attempts to access Primes without asking to lock it beforehand, the thread will still be able to access the variable. 当然,如果另一个线程试图访问Primes而没有要求事先锁定它,那么该线程仍然可以访问该变量。

You should've learned all of this though, since the Parallel Prime Sieve is one of the most common beginner exercises when first learning about multithreading. 你应该已经学会了所有这些,因为Parallel Prime Sieve是第一次学习多线程时最常见的初学者练习之一。

EDIT: 编辑:

After all the above steps are done, the program shouldn't run amok; 完成上述所有步骤后,程序不应该运行; however this does not mean that the solution will be correct, or that you'll have gained a speedup, since many of your threads will be doing duplicate work. 但是这并不意味着解决方案是正确的,或者你已经获得了加速,因为你的许多线程都会做重复的工作。

Assume Thread a is given the responsibility to mark every multiple of 3, while Thread n is given the responsibility to mark the multiples of 9. When run sequentially, by the time Thread n begins processing the multiples of 9, it will see that 9 is already a multiple of another (prime) number. 假设Thread a负责标记3的每个倍数,而Thread n负责标记9的倍数。当顺序运行时,在Thread n开始处理9的倍数时,它将看到9是已经是另一个(素数)的倍数。 However, since your code is now parallel, there is no guarantee that 9 will be marked by the time Thread n begins its work. 但是,由于您的代码现在是并行的,因此无法保证Thread n开始工作时将标记9。 Not to mention that - since 9 may not be marked - might be added to the list of prime numbers. 更不用说 - 因为9可能没有标记 - 可能会被添加到素数列表中。

Because of this, you have to sequentially find all prime numbers between 1 and the square root of To . 因此,您必须按顺序查找1和To平方根之间的所有素数。 Why the square root of To ? 为什么的平方根To That's something you'll have to find out yourself. 这是你必须自己找到的东西。

Once you have found all prime numbers from 1 to the square root of To , you can start your parallel prime sieve in order to find the rest of the primes, using all primes found previously. 一旦找到从1到To平方根的所有素数,就可以使用之前找到的所有素数来启动平行素数筛,以找到其余素数。

One last noteworthy point would be, that Primes should be built only after Trues has been filled. 最后一个值得注意的一点是,该Primes只应建立Trues已经充满。 That's because: 那是因为:

1. Your threads will only have to process the multiples of 2 to the square root of To , thus will in the current implementation not add any more elements to Primes beyond the root. 1.你的线程只需要处理To平方根的2的倍数,因此在当前的实现中不会在根之外添加任何更多的元素到Primes

2. If you choose to have your threads go beyond the root, you'll face the problem, that one of your threads will add a non-prime number to Primes shortly before another thread marks that number as non-prime, which is not what you want. 2.如果你选择让你的线程超越根,你将面临问题,你的一个线程将在另一个线程将该数字标记为非素数之前不久向Primes添加非素数,这不是你想要什么。

3. Even in the event that you were lucky and all elements of Primes are indeed all prime numbers between 1 and To , they may not necessarily be in order, requiring Primes to be sorted first. 3.即使你很幸运,并且Primes所有元素确实都是1和To之间的所有素数,它们可能不一定是有序的,要求首先对Primes进行排序。

b is shared across threads. b跨线程共享。 What do you expect to happen if multiple threads bang on that poor variable at once? 如果多个线程立刻撞到那个糟糕的变量,你发生什么?

It seems like b and a are always equal in your code (or differing by one). 似乎ba在代码中总是相等(或者相差一个)。 Use a . a And synchronize access to all other shared state (like the list). 并同步访问所有其他共享状态(如列表)。

Welcome to the wonderful world of multithreading. 欢迎来到精彩的多线程世界。

Right off the bat, I can see that every iteration of your loop does a b++ and then uses b throughout its course. 马上,我可以看到循环的每次迭代都是b++ ,然后在整个过程中使用b This means that every iteration of your loop will be modifying the value of b in the midst of all other iterations. 这意味着循环的每次迭代都将在所有其他迭代中修改b的值。

What you probably want to do is use the a variable made available in your inline function, which does exactly what you seem to be trying to do with b . 可能想要做的是使用你的内联函数中提供的a变量, a变量完全与你试图用b做的一样。 On the off chance that this is not the case, then you should look into locking b and copying its value to a local (to each iteration) variable before doing stuff to it. 如果不是这种情况,那么你应该研究一下锁定b并将其值复制到本地(每个迭代)变量,然后再对其进行处理。

Try this instead and let me know if it's what you wanted to do: 试试这个,告诉我你是不是想做什么:

static List<long> Sieve(long To)
{
    To /= 2;

    List<long> Primes = new List<long>();

    if (To > 0)
        Primes.Add(2);

    Parallel.For(1L, To, a =>
    {
        long f = 2 * a + 1;
        Primes.Add(f);
    });

    Primes.Sort();

    return Primes;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM