簡體   English   中英

如何將二維數組划分為更小的多個二維數組

[英]how to partitioning 2D array into smaller multiple 2D array

假設我們有一個像(65000 行,14 列)這樣大的 2D 數組,我們想將此數組划分為多個 2D 數組,而沒有重復依賴於 1D 數組作為位置數組(索引數)我該如何解決這個問題

double[][] ch = new double[1000][14];


while(k<=dataset.length%100){
    int i=0;

    best = swarm.getBestPosition();
    ch = DatasetChunks(best, dataset, i++);
    ChunksPrint(ch, k);
    best=null;

    k++;
}
    
private static double[][] DatasetChunks(double[] best, double[][] dataset) {
    for (int i = 0; i < row; i++) {
        for (int j = 0; j < col; j++) {
            ch1[i][j] = dataset[best[i]][j];
        }
    }
    return ch1;
}

考慮到使用List<double[][]>對分區來說更容易,請檢查以下內容,我嘗試在某些行上添加注釋

class Main {
  public static void main(String[] args) throws Exception {
    double[][] dataset = new double[10][2];
    dataset[0][0] = 5;
    dataset[6][0] = 6;

    double[] partitions = { 0, 5, 10 }; // better if int than double, the first chunk holds elems from 0-5 of the original array, then the second 5-10....

    List<double[][]> chunks = DatasetChunks(partitions, dataset, 2);
    for (int i = 0; i < chunks.size(); i++) {
      System.out.println("chunk " + i);
      for (double[] d : chunks.get(i)) {
        System.out.println(Arrays.toString(d));
      }
    }
  }

  private static List<double[][]> DatasetChunks(double[] best, double[][] dataset, int cols) {
    List<double[][]> chunks = new ArrayList<>();
    double[][] chunk = {}; //to be initialized later
    for (int i = 0; i < best.length - 1; i++) {
      int startIndex = (int) best[i]; //needs explicit cast since the partitions array is double in the main method
      int endIndex = (int) best[i + 1];//needs explicit cast since the partitions array is double in the main method
      chunk = new double[endIndex - startIndex][cols]; // a new chunk
      for (int j = startIndex, f = 0; j < endIndex; j++, f++) {
        for (int p = 0; p < cols; p++) {
          chunk[f][p] = dataset[j][p];
        }
      }
      chunks.add(chunk);
    }
    return chunks;
  }
}

輸出

chunk 0
[5.0, 0.0]
[0.0, 0.0]
[0.0, 0.0]
[0.0, 0.0]
[0.0, 0.0]
chunk 1
[0.0, 0.0]
[6.0, 0.0]
[0.0, 0.0]
[0.0, 0.0]
[0.0, 0.0]

好吧,每行包含 14 列的 65,000 行確實不是那么多,可以立即分區為 500 行(130)個分區,即使在提供dataset[][]數組的 10 歲計算機上也是如此包含在內存中。

一個簡單的方法可以將您的二維 double 類型數組划分為 double[][] 類型的列表接口(正如我在評論中提到的):

List<double[][]> partitions = new ArrayList<>(); 

partition2D_DoubleType_Array()方法:

/**
 * This method will partition the supplied double type 2D Array into several
 * double type 2D Arrays with each 2d Array consisting of the number Rows
 * determined by the supplied desired partition size.
 *
 * @param array                   (2D double[][] Type Array) The 2D double
 *                                type array to partition.<br>
 *
 * @param desiredSizeOfPartitions (Integer - int) The desired size for each
 *                                Array partition.<br>
 *
 * @return A List Interface of double[][] ({@code List<double[][]>})
 *         containing all the Partitioned double[][] type arrays.
 */
public static List<double[][]> partition2D_DoubleType_Array(final double[][] array, final int desiredSizeOfPartitions) {
    int desiredPartitionSize = desiredSizeOfPartitions;
    List<double[][]> partitions = new ArrayList<>();
    int numberOfArrays = (int) Math.ceil((double) array.length / desiredPartitionSize);
    int rowsNeeded = (int) Math.ceil((double) array.length / numberOfArrays);
    double[][] dataChunk = new double[rowsNeeded][array[0].length + 1];

    int k = 0;
    for (int i = 0; i < array.length; i++) {
        dataChunk[k][0] = i;
        for (int j = 0; j < array[i].length; j++) {
            dataChunk[k][j + 1] = array[i][j];
        }
        k++;
        if (k == rowsNeeded) {
            partitions.add(dataChunk);
            k = 0;
            dataChunk = new double[rowsNeeded][array[0].length + 1];
        }
    }
    if (k > 0) {
        partitions.add(dataChunk);
    }
    return partitions;
}

要使用上述方法,您當然需要一個填充的 2D double 類型數組。 我不知道您如何填充dataset[][]數組,但出於測試目的,我們將使用它:

填充dataset[][]數組進行測試:

double[][] dataset = new double[65000][14];
// Fill the dataset 2D Array with fictitious floating point values.
String title = "Creating a 2D double Type Array: dataset[65000][14] and "
               + "filling with fictitious data values.";
String underline = String.join("", java.util.Collections.nCopies(title.length(), "-"));
System.out.println(title);
System.out.println(underline);
for (int i = 0; i < dataset.length; i++) {
    for (int j = 0; j < 14; j++) {
        dataset[i][j] = (double) (j + i) + (0.5d);
    }
    /* Un-comment the below line if you want to view the created
       dataset[][] array within the Console Window.   */
    // System.out.println((i + 1) + ") " + Arrays.toString(dataset[i]));
}
System.out.println("Array Creation COMPLETE!");
System.out.println();

既然已經創建並填充了dataset[][]數組,我們實際上可以將其划分為 130 個分區的 double[][] 類型的二維數組,該數組由 500 行組成,每行包含15列。 等等... 15列? 分區數組假設有14列!

我們實際上對分區數組進行了修改,同時向每個 Row 添加了一個額外元素,以便我們可以將原始dataset[][] Row Index 值存儲到每個分區數組 Row 的第一個元素中(在 index 0 )。 因此,從現在開始,每個分區數組行將包含實際從dataset[][]數組中獲取數據的位置的行索引值。 在從任何分區數組行中實際檢索所有所需的列數據時,我們需要牢記這一點。

為什么要這樣做? 原因很簡單,您需要處理基於隨機dataset[][]數組行索引值(0 到 65000)檢索分區行列數據。 從這個隨機的行索引值,我們可以確定數據行包含在哪個分區中,因為我們將原始行索引值存儲在每個分區數組行的索引 0 處,我們可以獲取該分區內的確切數據行,以便獲得所需的列數據 [索引 1 到索引 14]。 請記住,索引 0 是為來自dataset[][]數組的原始行索引值保留的。

dataset[][]數組進行分區:

// Partition into 130 2D Arrays [500][14]:
title = "Partitioning The dataset[][] 2D Array into 130 individual 2D "
        + "Arrays consisting of 500 Rows:";
underline = String.join("", java.util.Collections.nCopies(title.length(), "-"));
System.out.println(title);
System.out.println(underline);

int desiredPartitionSize = 500;
List<double[][]> partitions = partition2D_DoubleType_Array(dataset, desiredPartitionSize);
System.out.println("2D Array Partitioning COMPLETE!");

根據隨機dataset[][]行索引值獲取分區數組行數據:

int datasetRowIndexTotal = dataset.length;
// Allow the dataset[][] array to be garbage collected 
// in order to save memory since we don't need it anymore.
dataset[0] = null;

/* Retrieve a random row index value from the original dataset[][] 
   array and locate all the columnar values for that row from within 
   the Partitioned arrays. We will do this five times therefore we'll
   be pulling out five random Row Index values determined from the 
   original dataset[][] array (total rows now in datasetRowIndexTotal.  */
for (int n = 0; n < 5; n++) {
    // Get a random row index value
    int randomIndex = (int) (Math.random() * ((datasetRowIndexTotal) - 0)) + 0;
    System.out.println("Find data from random Row Index #: --> " + randomIndex);
    
    /* Determine which Partition Array the generated
       random row index value will be contained in.  */
    int partitionIndex = ((int) Math.ceil((double) randomIndex / desiredPartitionSize)) - 1;
    System.out.println("Determined 'Partition' Index is: --> " + partitionIndex);
    /* For readability, place the Partition Array
       into a tmp[][] double type 2D Array.   */
    double[][] tmp = partitions.get(partitionIndex);
    /* For demo simplicity we're going to place
       the acquired columnar values into a comma
       (", ") delimited String using StringBuilder.  */
    StringBuilder sb = new StringBuilder("");
    // Iterate through the determined Partition Array.
    for (int i = 0; i < tmp.length; i++) {
        /* Is the desired Row Index value in this 
           particular Partition Array Row?     */
        if ((int)tmp[i][0] == randomIndex) {
            /* Yes...get the columnar values for this data Row. 
               Notice how we start j from 1 (not 0)? This is 
               because index 0 is reserved for original Row
               index values (remember).         */
            for (int j = 1; j < tmp[i].length; j++) {
                if (!sb.toString().isEmpty()) {
                    sb.append(", ");
                }
                sb.append(tmp[i][j]);
            }
        }
    }
    // Display the find!
    System.out.println("Columnar Data for the random Row Index of: --> " + randomIndex);
    System.out.println(sb.toString());
    System.out.println();
}

正確輸入所有代碼並運行它后,您應該在控制台窗口中看到如下內容

Creating a 2D Double Type Array: dataset[65000][14] and filling with fictitious data values.
--------------------------------------------------------------------------------------------
Array Creation COMPLETE!

Partitioning The dataset[][] 2D Array into 130 individual 2D Arrays consisting of 500 Rows:
-------------------------------------------------------------------------------------------
2D Array Partitioning COMPLETE!


Find data from random Row Index #: --> 1127
Determined 'Partition' Index is: --> 2
Columnar Data for the random Row Index of: --> 1127
1127.5, 1128.5, 1129.5, 1130.5, 1131.5, 1132.5, 1133.5, 1134.5, 1135.5, 1136.5, 1137.5, 1138.5, 1139.5, 1140.5

Find data from random Row Index #: --> 1406
Determined 'Partition' Index is: --> 2
Columnar Data for the random Row Index of: --> 1406
1406.5, 1407.5, 1408.5, 1409.5, 1410.5, 1411.5, 1412.5, 1413.5, 1414.5, 1415.5, 1416.5, 1417.5, 1418.5, 1419.5

Find data from random Row Index #: --> 36801
Determined 'Partition' Index is: --> 73
Columnar Data for the random Row Index of: --> 36801
36801.5, 36802.5, 36803.5, 36804.5, 36805.5, 36806.5, 36807.5, 36808.5, 36809.5, 36810.5, 36811.5, 36812.5, 36813.5, 36814.5

Find data from random Row Index #: --> 28021
Determined 'Partition' Index is: --> 56
Columnar Data for the random Row Index of: --> 28021
28021.5, 28022.5, 28023.5, 28024.5, 28025.5, 28026.5, 28027.5, 28028.5, 28029.5, 28030.5, 28031.5, 28032.5, 28033.5, 28034.5

Find data from random Row Index #: --> 18916
Determined 'Partition' Index is: --> 37
Columnar Data for the random Row Index of: --> 18916
18916.5, 18917.5, 18918.5, 18919.5, 18920.5, 18921.5, 18922.5, 18923.5, 18924.5, 18925.5, 18926.5, 18927.5, 18928.5, 18929.5

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM