简体   繁体   English

stream / fork-join如何通过线程安全地访问数组?

[英]How do streams / fork-join access arrays thread-safely?

Streams and fork-join both provide functionality to parallelize code that accesses arrays. Streams和fork-join都提供了并行化访问数组的代码的功能。 For example, Arrays.parallelSetAll is implemented largely by the following line: 例如, Arrays.parallelSetAll主要由以下行实现:

IntStream.range(0, array.length).parallel()
    .forEach(i -> { array[i] = generator.applyAsLong(i); });

Also, the documentation of RecursiveAction , part of the fork-join framework, contains the following example: 此外, RecursiveAction文档 ,fork-join框架的一部分,包含以下示例:

static class SortTask extends RecursiveAction {
    final long[] array; final int lo, hi;
    ...
    void merge(int lo, int mid, int hi) {
        long[] buf = Arrays.copyOfRange(array, lo, mid);
        for (int i = 0, j = lo, k = mid; i < buf.length; j++)
            array[j] = (k == hi || buf[i] < array[k]) ?
                buf[i++] : array[k++];
    }
}

Finally, parallel streams created from arrays access the arrays in multiple threads (the code is too complex to summarize here). 最后,从数组创建的并行流以多个线程访问数组(代码太复杂,无法在此汇总)。

All of these examples appear to read from or write to arrays without any synchronization or other memory barriers (as far as I can tell). 所有这些示例似乎都是对数组进行读取或写入,而没有任何同步或其他内存障碍(据我所知)。 As we know, completely ad hoc multithreaded array accesses are unsafe as there is no guarantee that a read reflects a write in another thread unless there is a happens-before relationship between the read and the write. 我们知道,完全临时的多线程数组访问是不安全的,因为不能保证读取反映另一个线程中的写入,除非读取和写入之间存在先发生关系。 In fact, the Atomic...Array classes were created specifically to address this issue. 事实上, Atomic...Array类是专门为解决这个问题而创建的。 However, given that each example above is in the standard library or its documentation, I presume they're correct. 但是,鉴于上面的每个例子都在标准库或其文档中,我认为它们是正确的。

Can someone please explain what mechanism guarantees the safety of the array accesses in these examples? 在这些例子中,有人可以解释一下哪种机制可以保证阵列访问的安全性?

Short answer: partitioning. 简短的回答:分区。

The JMM is defined in terms of access to variables . JMM是根据对变量的访问来定义的 Variables include static fields, instance fields, and array elements. 变量包括静态字段,实例字段和数组元素。 If you arrange your program such that thread T0 is the only thread to access element 0 of an array, and similarly T1 is the only thread to access element 1 of an array, then each of these elements is effectively thread-confined, and you have no problem -- the JMM program order rule takes care of you. 如果你安排你的程序,使得线程T0是访问数组元素0的唯一线程,同样T1是访问数组元素1的唯一线程,那么这些元素中的每一个都是有效线程限制的,并且你有没问题 - JMM程序订单规则会照顾你。

Parallel streams build on this principle. 并行流建立在这个原则之上。 Each task is working on a segment of the array that no other task is working on. 每个任务都在处理数组的一部分,而其他任务正在处理。 Then all we have to do is ensure that the thread running a task can see the initial state of the array, and the consumer of the final result can see the as-modified-by-the-task view of the appropriate section of the array. 然后我们要做的就是确保运行任务的线程可以看到数组的初始状态,最终结果的使用者可以看到数组相应部分的as-modified-by-the-task视图。 These are easily arranged through synchronization actions embedded in the implementation of the parallel stream and FJ libraries. 通过嵌入在并行流和FJ库的实现中的同步动作,可以很容易地安排这些。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM