简体   繁体   English

java中数组的并发更新

[英]concurrent updates on an array in java

in java, I have a large array of strings. 在java中,我有一大堆字符串。

I have one thread doing something like this: 我有一个线程做这样的事情:

for (int i=0;i<10000;i++) array[i] = getSomeValue();

I have another thread doing something like this: 我有另一个线程做这样的事情:

for (int i=10000;i<20000;i++) array[i] = getSomeValue();

and another thread doing: 和另一个线程:

for (int i=20000;i<30000;i++) array[i] = getSomeValue();

and so on. 等等。

do I have to do something special to do this operation ? 我必须做一些特别的事情来做这个操作吗?

will it work ? 它会起作用吗?

I am trying to populate this large array faster by splitting the task into multiple threads but I wonder if this is the correct thing to do. 我试图通过将任务分成多个线程来更快地填充这个大型数组,但我想知道这是否正确。

I am working with a 64 bit machine 16 cpus and all the fancy stuff. 我正在使用64位机器16 cpu和所有花哨的东西。

Your code will work fine. 您的代码将正常工作。

Different portions of an array are independent of eachother. 阵列的不同部分彼此独立。

The spec says: 规范说:

One implementation consideration for Java virtual machines is that every field and array element is considered distinct Java虚拟机的一个实现考虑因素是每个字段和数组元素都被认为是不同的

This should work fine. 这应该工作正常。 However, if you want to be sure it's safe, you can populate different arrays in each thread and then System.arraycopy() them into one big array. 但是,如果您想确保它是安全的,您可以在每个线程中填充不同的数组,然后将System.arraycopy()填充到一个大数组中。

you can safely init the array with this code, however any code which needs to use the array afterwards needs to be correctly synchronized with the threads which are doing the initial updates. 您可以使用此代码安全地初始化数组,但是之后需要使用该数组的任何代码都需要与正在执行初始更新的线程正确同步。 this can be as simple as "join"ing all the init threads before using the array. 这可以像在使用数组之前“加入”所有init线程一样简单。

With Java 8, this has become much easier: 使用Java 8,这变得更加容易:

Arrays.parallelSetAll(array, i -> getSomeValue());

This should also solve the problems of visibility mentioned in other answers and comments. 这也应解决其他答案和评论中提到的可见性问题。

It should be fine unless getSomeValue() has side-effects that change mutable state that multiple threads access. 除非getSomeValue()具有改变多线程访问的可变状态的副作用,否则应该getSomeValue() If it doesn't have any state-changes, then you are set. 如果它没有任何状态更改,那么您已设置。 You won't access the same bit of memory in any of the loops. 您不会在任何循环中访问相同的内存位。

Whether or not it will actually be faster depends on your hardware setup and threading implementation. 它是否真的更快取决于您的硬件设置和线程实现。

As long as each thread works on a specific segment of the array then the updates should be fine. 只要每个线程在数组的特定段上工作,那么更新应该没问题。 Addiontally you can definitely see a performance boost by dividing the work. 另外,你可以通过划分工作来获得性能提升。 You'll probably want to test the level of the boost, by testing with a different number of threads. 您可能希望通过使用不同数量的线程进行测试来测试增强级别。 Probably should start at 16 since you have 16 CPU, and see how increasing and decreasing effects performance. 可能应该从16开始,因为你有16个CPU,并看看如何增加和减少影响性能。

One issue you may have is with visibility. 您可能遇到的一个问题是可见性。 I don't believe the elements of the array are guaranteed to be seen by all threads because they aren't volatile. 我不相信所有线程都能保证数组的元素可见,因为它们不是易失性的。 So if sections of the array need to be accessed by multiple threads then you could have an issue. 因此,如果需要多个线程访问数组的各个部分,那么您可能会遇到问题。 One way to deal with this is to use a AtomicIntegerArray... AtomicReferenceArray. 处理此问题的一种方法是使用AtomicIntegerArray ... AtomicReferenceArray。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM