繁体   English   中英

如何将二维数组划分为更小的多个二维数组

[英]how to partitioning 2D array into smaller multiple 2D array

假设我们有一个像(65000 行,14 列)这样大的 2D 数组,我们想将此数组划分为多个 2D 数组,而没有重复依赖于 1D 数组作为位置数组(索引数)我该如何解决这个问题

double[][] ch = new double[1000][14];


while(k<=dataset.length%100){
    int i=0;

    best = swarm.getBestPosition();
    ch = DatasetChunks(best, dataset, i++);
    ChunksPrint(ch, k);
    best=null;

    k++;
}
    
private static double[][] DatasetChunks(double[] best, double[][] dataset) {
    for (int i = 0; i < row; i++) {
        for (int j = 0; j < col; j++) {
            ch1[i][j] = dataset[best[i]][j];
        }
    }
    return ch1;
}

考虑到使用List<double[][]>对分区来说更容易,请检查以下内容,我尝试在某些行上添加注释

class Main {
  public static void main(String[] args) throws Exception {
    double[][] dataset = new double[10][2];
    dataset[0][0] = 5;
    dataset[6][0] = 6;

    double[] partitions = { 0, 5, 10 }; // better if int than double, the first chunk holds elems from 0-5 of the original array, then the second 5-10....

    List<double[][]> chunks = DatasetChunks(partitions, dataset, 2);
    for (int i = 0; i < chunks.size(); i++) {
      System.out.println("chunk " + i);
      for (double[] d : chunks.get(i)) {
        System.out.println(Arrays.toString(d));
      }
    }
  }

  private static List<double[][]> DatasetChunks(double[] best, double[][] dataset, int cols) {
    List<double[][]> chunks = new ArrayList<>();
    double[][] chunk = {}; //to be initialized later
    for (int i = 0; i < best.length - 1; i++) {
      int startIndex = (int) best[i]; //needs explicit cast since the partitions array is double in the main method
      int endIndex = (int) best[i + 1];//needs explicit cast since the partitions array is double in the main method
      chunk = new double[endIndex - startIndex][cols]; // a new chunk
      for (int j = startIndex, f = 0; j < endIndex; j++, f++) {
        for (int p = 0; p < cols; p++) {
          chunk[f][p] = dataset[j][p];
        }
      }
      chunks.add(chunk);
    }
    return chunks;
  }
}

输出

chunk 0
[5.0, 0.0]
[0.0, 0.0]
[0.0, 0.0]
[0.0, 0.0]
[0.0, 0.0]
chunk 1
[0.0, 0.0]
[6.0, 0.0]
[0.0, 0.0]
[0.0, 0.0]
[0.0, 0.0]

好吧,每行包含 14 列的 65,000 行确实不是那么多,可以立即分区为 500 行(130)个分区,即使在提供dataset[][]数组的 10 岁计算机上也是如此包含在内存中。

一个简单的方法可以将您的二维 double 类型数组划分为 double[][] 类型的列表接口(正如我在评论中提到的):

List<double[][]> partitions = new ArrayList<>(); 

partition2D_DoubleType_Array()方法:

/**
 * This method will partition the supplied double type 2D Array into several
 * double type 2D Arrays with each 2d Array consisting of the number Rows
 * determined by the supplied desired partition size.
 *
 * @param array                   (2D double[][] Type Array) The 2D double
 *                                type array to partition.<br>
 *
 * @param desiredSizeOfPartitions (Integer - int) The desired size for each
 *                                Array partition.<br>
 *
 * @return A List Interface of double[][] ({@code List<double[][]>})
 *         containing all the Partitioned double[][] type arrays.
 */
public static List<double[][]> partition2D_DoubleType_Array(final double[][] array, final int desiredSizeOfPartitions) {
    int desiredPartitionSize = desiredSizeOfPartitions;
    List<double[][]> partitions = new ArrayList<>();
    int numberOfArrays = (int) Math.ceil((double) array.length / desiredPartitionSize);
    int rowsNeeded = (int) Math.ceil((double) array.length / numberOfArrays);
    double[][] dataChunk = new double[rowsNeeded][array[0].length + 1];

    int k = 0;
    for (int i = 0; i < array.length; i++) {
        dataChunk[k][0] = i;
        for (int j = 0; j < array[i].length; j++) {
            dataChunk[k][j + 1] = array[i][j];
        }
        k++;
        if (k == rowsNeeded) {
            partitions.add(dataChunk);
            k = 0;
            dataChunk = new double[rowsNeeded][array[0].length + 1];
        }
    }
    if (k > 0) {
        partitions.add(dataChunk);
    }
    return partitions;
}

要使用上述方法,您当然需要一个填充的 2D double 类型数组。 我不知道您如何填充dataset[][]数组,但出于测试目的,我们将使用它:

填充dataset[][]数组进行测试:

double[][] dataset = new double[65000][14];
// Fill the dataset 2D Array with fictitious floating point values.
String title = "Creating a 2D double Type Array: dataset[65000][14] and "
               + "filling with fictitious data values.";
String underline = String.join("", java.util.Collections.nCopies(title.length(), "-"));
System.out.println(title);
System.out.println(underline);
for (int i = 0; i < dataset.length; i++) {
    for (int j = 0; j < 14; j++) {
        dataset[i][j] = (double) (j + i) + (0.5d);
    }
    /* Un-comment the below line if you want to view the created
       dataset[][] array within the Console Window.   */
    // System.out.println((i + 1) + ") " + Arrays.toString(dataset[i]));
}
System.out.println("Array Creation COMPLETE!");
System.out.println();

既然已经创建并填充了dataset[][]数组,我们实际上可以将其划分为 130 个分区的 double[][] 类型的二维数组,该数组由 500 行组成,每行包含15列。 等等... 15列? 分区数组假设有14列!

我们实际上对分区数组进行了修改,同时向每个 Row 添加了一个额外元素,以便我们可以将原始dataset[][] Row Index 值存储到每个分区数组 Row 的第一个元素中(在 index 0 )。 因此,从现在开始,每个分区数组行将包含实际从dataset[][]数组中获取数据的位置的行索引值。 在从任何分区数组行中实际检索所有所需的列数据时,我们需要牢记这一点。

为什么要这样做? 原因很简单,您需要处理基于随机dataset[][]数组行索引值(0 到 65000)检索分区行列数据。 从这个随机的行索引值,我们可以确定数据行包含在哪个分区中,因为我们将原始行索引值存储在每个分区数组行的索引 0 处,我们可以获取该分区内的确切数据行,以便获得所需的列数据 [索引 1 到索引 14]。 请记住,索引 0 是为来自dataset[][]数组的原始行索引值保留的。

dataset[][]数组进行分区:

// Partition into 130 2D Arrays [500][14]:
title = "Partitioning The dataset[][] 2D Array into 130 individual 2D "
        + "Arrays consisting of 500 Rows:";
underline = String.join("", java.util.Collections.nCopies(title.length(), "-"));
System.out.println(title);
System.out.println(underline);

int desiredPartitionSize = 500;
List<double[][]> partitions = partition2D_DoubleType_Array(dataset, desiredPartitionSize);
System.out.println("2D Array Partitioning COMPLETE!");

根据随机dataset[][]行索引值获取分区数组行数据:

int datasetRowIndexTotal = dataset.length;
// Allow the dataset[][] array to be garbage collected 
// in order to save memory since we don't need it anymore.
dataset[0] = null;

/* Retrieve a random row index value from the original dataset[][] 
   array and locate all the columnar values for that row from within 
   the Partitioned arrays. We will do this five times therefore we'll
   be pulling out five random Row Index values determined from the 
   original dataset[][] array (total rows now in datasetRowIndexTotal.  */
for (int n = 0; n < 5; n++) {
    // Get a random row index value
    int randomIndex = (int) (Math.random() * ((datasetRowIndexTotal) - 0)) + 0;
    System.out.println("Find data from random Row Index #: --> " + randomIndex);
    
    /* Determine which Partition Array the generated
       random row index value will be contained in.  */
    int partitionIndex = ((int) Math.ceil((double) randomIndex / desiredPartitionSize)) - 1;
    System.out.println("Determined 'Partition' Index is: --> " + partitionIndex);
    /* For readability, place the Partition Array
       into a tmp[][] double type 2D Array.   */
    double[][] tmp = partitions.get(partitionIndex);
    /* For demo simplicity we're going to place
       the acquired columnar values into a comma
       (", ") delimited String using StringBuilder.  */
    StringBuilder sb = new StringBuilder("");
    // Iterate through the determined Partition Array.
    for (int i = 0; i < tmp.length; i++) {
        /* Is the desired Row Index value in this 
           particular Partition Array Row?     */
        if ((int)tmp[i][0] == randomIndex) {
            /* Yes...get the columnar values for this data Row. 
               Notice how we start j from 1 (not 0)? This is 
               because index 0 is reserved for original Row
               index values (remember).         */
            for (int j = 1; j < tmp[i].length; j++) {
                if (!sb.toString().isEmpty()) {
                    sb.append(", ");
                }
                sb.append(tmp[i][j]);
            }
        }
    }
    // Display the find!
    System.out.println("Columnar Data for the random Row Index of: --> " + randomIndex);
    System.out.println(sb.toString());
    System.out.println();
}

正确输入所有代码并运行它后,您应该在控制台窗口中看到如下内容

Creating a 2D Double Type Array: dataset[65000][14] and filling with fictitious data values.
--------------------------------------------------------------------------------------------
Array Creation COMPLETE!

Partitioning The dataset[][] 2D Array into 130 individual 2D Arrays consisting of 500 Rows:
-------------------------------------------------------------------------------------------
2D Array Partitioning COMPLETE!


Find data from random Row Index #: --> 1127
Determined 'Partition' Index is: --> 2
Columnar Data for the random Row Index of: --> 1127
1127.5, 1128.5, 1129.5, 1130.5, 1131.5, 1132.5, 1133.5, 1134.5, 1135.5, 1136.5, 1137.5, 1138.5, 1139.5, 1140.5

Find data from random Row Index #: --> 1406
Determined 'Partition' Index is: --> 2
Columnar Data for the random Row Index of: --> 1406
1406.5, 1407.5, 1408.5, 1409.5, 1410.5, 1411.5, 1412.5, 1413.5, 1414.5, 1415.5, 1416.5, 1417.5, 1418.5, 1419.5

Find data from random Row Index #: --> 36801
Determined 'Partition' Index is: --> 73
Columnar Data for the random Row Index of: --> 36801
36801.5, 36802.5, 36803.5, 36804.5, 36805.5, 36806.5, 36807.5, 36808.5, 36809.5, 36810.5, 36811.5, 36812.5, 36813.5, 36814.5

Find data from random Row Index #: --> 28021
Determined 'Partition' Index is: --> 56
Columnar Data for the random Row Index of: --> 28021
28021.5, 28022.5, 28023.5, 28024.5, 28025.5, 28026.5, 28027.5, 28028.5, 28029.5, 28030.5, 28031.5, 28032.5, 28033.5, 28034.5

Find data from random Row Index #: --> 18916
Determined 'Partition' Index is: --> 37
Columnar Data for the random Row Index of: --> 18916
18916.5, 18917.5, 18918.5, 18919.5, 18920.5, 18921.5, 18922.5, 18923.5, 18924.5, 18925.5, 18926.5, 18927.5, 18928.5, 18929.5

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM