简体   繁体   English

如何在java中使用多线程对记录列表进行排序?

[英]How do I sort a list of records using multithreading in java?

I'm teaching myself multithreading in java.我正在自学 Java 中的多线程。 My dummy example is that I have a large list of records (a 2D array) that I want sorted.我的虚拟示例是,我有一大堆要排序的记录(二维数组)。 The single threaded approach is to use loop through the list of records and sort.单线程方法是使用循环遍历记录列表并排序。 I want to multithread my program to sort my list with a fixed number threads, in this case 2. One thread will sort the first half of the list and the second thread will sort the remaining half.我想对我的程序进行多线程处理,以使用固定数量的线程对我的列表进行排序,在这种情况下为 2。一个线程将对列表的前半部分进行排序,第二个线程将对剩余的一半进行排序。 Then I want to output the results, of the now sorted list of records.然后我想输出现在排序的记录列表的结果。

How can I create a thread pool of workers and sort the list of records?如何创建工作线程池并对记录列表进行排序? Do I need to worry about data being a shared resource?我需要担心data是共享资源吗? How do I return the results from each thread back to the original list of records?如何将每个线程的结果返回到原始记录列表? Below is my code.下面是我的代码。

import java.util.*;

class RunnableProcess implements Runnable {
  private int[] data;

  public RunnableProcess(int[] data) {
      this.data = data;
  }

  public void run() {
    try {

      // sort the records this thread has access to
      for (int i = 0; i < data.length; i++) {
        Arrays.sort(data[i]);
      }

    } catch(Exception ex) {
        ex.printStackTrace();
    }
  }
}

class BigData {

  static int[][] data = new int[1000][1000];

  public static void main(String [] args) {


    // Create records 
    for (int i = 0; i < data.length; i++) {
      for (int j = 0; j < data[0].length; j++) {
        data[i][j] = new Random().nextInt(999);
      }
    }

    // Call on two threads to sort the data variable
    // ExecutorService executor = Executors.newFixedThreadPool(2);


   // Python type of idea: Pass half the records to each thread and start
   // java doesn't support this so what is the java way of doing this?

   // Thread thread = new Thread(new RunnableProcess(data[:499]));
   // thread.start();

   // Thread thread = new Thread(new RunnableProcess(data[499:]));
   // thread.start();

  }
}

I am open suggestions on the best way to tackle this problem.我对解决这个问题的最佳方法持开放态度。

Java does not support slicing native arrays in the same fashion as python. Java 不支持以与 python 相同的方式对本机数组进行切片。 We can get close, using ArrayList .我们可以接近,使用ArrayList

First, an aside.首先,旁白。 You random data generation is very inefficient.您的随机数据生成效率非常低。 You are creating a new Random number generator object for each random number you generate.您正在为您生成的每个随机数创建一个新的Random数生成器对象。 You only need one generator, like this:你只需要一个生成器,像这样:

Random rnd = new Random();                     // Only created once
for (int i = 0; i < data.length; i++) {
    for (int j = 0; j < data[0].length; j++) {
        data[i][j] = rnd.nextInt(999);
    }
}

Once you have created the data, we can turn this native int[][] 2d-array into a List of records, where each record is an int[] 1d-array:创建数据后,我们可以将这个原生int[][]二维数组转换为记录List ,其中每条记录都是一个int[]维数组:

List<int[]> records = Arrays.asList(data);

Note that this does not copy the values in the array.请注意,这不会复制数组中的值。 It creates a List view of the array.它创建数组的List视图。 Any change to the values stored in data will be reflected in records and vice versa. data存储的值的任何更改都将反映在records ,反之亦然。

We do this, so we can use the List#subList() method, to split the list into two views.我们这样做,所以我们可以使用List#subList()方法,将列表拆分为两个视图。

List<int[]> first_half = records.subList(0, 500);
List<int[]> second_half = records.subList(500, 1000);

Again, these are views, backed by the original list (backed by the original array).同样,这些是由原始列表支持的视图(由原始数组支持)。 Changes made through the view will be reflected in the original.通过视图所做的更改将反映在原始视图中。

Since we now have the records stored in a List , instead of an array, we need to update the RunnableProcess to use this new format:由于我们现在将记录存储在List而不是数组中,因此我们需要更新RunnableProcess以使用这种新格式:

class RunnableProcess implements Runnable {
    private List<int[]> records;

    public RunnableProcess(List<int[]> records) {
        this.records = records;
    }

    @Override
    public void run() {
        // sort the records this thread has access to
        for (int[] record : records) {
            Arrays.sort(record);
        }
    }
}

We now have the data partitioned into two independent sets, and a RunnableProcess that can operate on each set.我们现在将数据划分为两个独立的集合,以及一个可以对每个集合进行操作的RunnableProcess Now, we can start the multithreading.现在,我们可以开始多线程了。

ExecutorService executor = Executors.newFixedThreadPool(2);

This executor service creates a pool of two threads, and will reuse these threads over and over again for subsequent tasks that are submitted to this executor.这个 executor 服务创建了一个包含两个线程的池,并且会为提交给这个 executor 的后续任务一遍又一遍地重用这些线程。 Because of this, you do NOT need to create and start your own threads.正因为如此,你不需要建立并开始自己的线程。 The executor will take care of this.执行者会处理这件事。

executor.submit(new RunnableProcess(first_half));
executor.submit(new RunnableProcess(second_half));

Since we want to know when these tasks are both finished, we need to save the Future returned from executor.submit() :由于我们想知道这些任务何时完成,我们需要保存从executor.submit()返回的Future

Future<?> task1 = executor.submit(new RunnableProcess(first_half));
Future<?> task2 = executor.submit(new RunnableProcess(second_half));

Calling Future#get() waits for the task to complete, and retrieves the result of the task.调用Future#get()等待任务完成,并检索任务的结果。 (Note: Since our Runnable does not return a value, the null value will be returned.) (注意:由于我们的Runnable没有返回值,所以将返回null值。)

task1.get();  // Wait for first task to finish ...
task2.get();  // ... as well as the second task to finish.

Finally, you need to #shutdown() the executor, or your program may not terminate properly.最后,您需要#shutdown()执行程序,否则您的程序可能无法正常终止。

executor.shutdown();

Complete example:完整示例:

List<int[]> records = Arrays.asList(data);
List<int[]> first_half = records.subList(0, 500);
List<int[]> second_half = records.subList(500, 1000);

ExecutorService executor = Executors.newFixedThreadPool(2);

try {
    Future<?> task1 = executor.submit(new RunnableProcess(first_half));
    Future<?> task2 = executor.submit(new RunnableProcess(second_half));

    task1.get();  // Wait for first task to finish ...
    task2.get();  // ... as well as the second task to finish.
} catch (InterruptedException | ExecutionException e) {
    e.printStackTrace();
}

executor.shutdown();

Do I need to worry about data being a shared resource?我需要担心数据是共享资源吗?

In this case, no.在这种情况下,没有。 Your data is an array of arrays.您的data是一个数组数组。 Each thread is only referencing the data array (as a List ), to get references to the int[] records.每个线程仅引用data数组(作为List ),以获取对int[]记录的引用。 The data array itself is not be modified; data数组本身不被修改; only the records are, but each one is modified only by one of the threads.只有记录是,但每一个只被一个线程修改。

How do I return the results from each thread back to the original list of records?如何将每个线程的结果返回到原始记录列表?

Since the records are being sorted "in place", your data variable already contains your array of sorted records.由于记录是“就地”排序的,您的data变量已经包含您的排序记录数组。 The calls to Future#get() ensures that each Thread has finished its processing, so that the data can once again be safely accessed from the main thread.Future#get()的调用确保每个Thread都完成了它的处理,以便可以再次从主线程安全地访问数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM