简体   繁体   中英

Optimal way to output large data file in java

I'm trying to output a large amount of data to a file. Right now, I am trying the following:

byte[][] hands, with dimensions 2.5 billion x 7

I have a series of nested for loops:

for ...
  for ...
    for ...
      hands[i][j] = blah

Then I'm outputting all of the entries of the array hands at the end.

An alternative would be to use no memory, and write each time: for ... for ... for ... pw.println(blah)

But this seems like it will be really slow, since it will constantly be printing.

Is the first approach the best? Would some intermediate approach be better, like storing and printing every k entries? If so, what would be a good value of k to use?

Edit: here's code

package tables;

import general.Config;
import general.Constants;

import java.io.BufferedReader;
import java.io.DataOutputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.util.StringTokenizer;

// Outputs canonical river hands
public class OutputRiverCanonicalHands3 implements Config, Constants{

    public static void main(String[] args) throws IOException {
        int half_river = (int)(NUM_RIVER_HANDS/2);
        boolean[] river_seen_index_1 = new boolean[half_river];
        boolean[] river_seen_index_2 = new boolean[(int)(NUM_RIVER_HANDS - half_river)];
        System.out.println("DONE DECLARING RIVER SEEN");
        byte hole11, hole12, board1, board2, board3, board4, board5;
        long river_index;

        byte[][] turnHands = new byte[NUM_TURN_HANDS][6]; 
        System.out.println("DONE DECLARING TURN");
        BufferedReader br = new BufferedReader(new FileReader(RIVER_TURN_INDICES_FILE2));
        int count = 0;
        while (br.ready()) {
            StringTokenizer str = new StringTokenizer(br.readLine());
            str.nextToken();
            for (int i = 0; i < turnHands[count].length; ++i)
                turnHands[count][i] = Byte.parseByte(str.nextToken());
            ++count;
        }
        br.close();
        System.out.println("DONE READING TURN");

        DataOutputStream dos = new DataOutputStream(new FileOutputStream(RIVER_CANONICAL_HANDS_FILE3));
        byte[][] hands = new byte[half_river][7];
        System.out.println("DONE DECLARING RIVER ARRAY");

        long startTime = System.currentTimeMillis();
        int arrayIndex;
        for (int i = 0; i < turnHands.length; ++i) {
            if (i % 100000 == 0) {
                long elapsedTime = System.currentTimeMillis() - startTime;
                System.out.println(i + " " + elapsedTime);
            }
            hole11 = turnHands[i][0];
            hole12 = turnHands[i][1];
            board1 = turnHands[i][2];
            board2 = turnHands[i][3];
            board3 = turnHands[i][4];
            board4 = turnHands[i][5];
            for (board5 = 0; board5 < DECK_SIZE; ++board5) {
                if (board5 == hole11 || board5 == hole12 
                        || board5 == board1 || board5 == board2 || board5 == board3 || board5 == board4)
                    continue;

                river_index = ComputeIndicesTight.compute_river_index(hole11, hole12, board1, board2, board3, board4, board5);
                if (river_index < half_river && river_seen_index_1[(int)river_index]) 
                    continue;
                if (river_index >= half_river && river_seen_index_2[(int)(river_index - half_river)])
                    continue;
                if (river_index < half_river) {
                    arrayIndex = (int)river_index;
                    river_seen_index_1[arrayIndex] = true;
                    hands[arrayIndex][0] = hole11;
                    hands[arrayIndex][1] = hole12;
                    hands[arrayIndex][2] = board1;
                    hands[arrayIndex][3] = board2;
                    hands[arrayIndex][4] = board3;
                    hands[arrayIndex][5] = board4;
                    hands[arrayIndex][6] = board5;
                }
                else if (river_index == half_river) {
                    System.out.println("HALFWAY THERE");
                    for (int j = 0; j < hands.length; ++j) 
                        for (int k = 0; k < 7; ++k)
                            dos.writeByte(hands[j][k]);
                    hands = new byte[(int)(NUM_RIVER_HANDS - half_river)][7];
                    System.out.println("DONE PRINTING HALFWAY!");
                }
                if (river_index >= half_river) {
                    arrayIndex = (int)(river_index - half_river);
                    river_seen_index_2[arrayIndex] = true;
                    hands[arrayIndex][0] = hole11;
                    hands[arrayIndex][1] = hole12;
                    hands[arrayIndex][2] = board1;
                    hands[arrayIndex][3] = board2;
                    hands[arrayIndex][4] = board3;
                    hands[arrayIndex][5] = board4;
                    hands[arrayIndex][6] = board5;
                }
            }
        }
        for (int j = 0; j < hands.length; ++j) 
            for (int k = 0; k < 7; ++k)
                dos.writeByte(hands[j][k]);

        dos.close();
    }
}

(As I suspected ...)

The output performance problem with your code has a very simple explanation. This line:

DataOutputStream dos = new DataOutputStream(
       new FileOutputStream(RIVER_CANONICAL_HANDS_FILE3));

is creating a stream that writes direct to a file without any buffering. Each time you do a write , it will perform a write system call. That is expensive. You should get much better performance by simply adding a BufferedOutputStream to the output pipeline:

DataOutputStream dos = new DataOutputStream(
       new BufferedOutputStream(
               new FileOutputStream(RIVER_CANONICAL_HANDS_FILE3)));

I figured writing the data in binary would save some space, since the file will be so large.

It wont. The space usage will be exactly the same as if you had written byte values to the FileOutputStream .

In fact, if that is the sole reason for using DataOutputStream , you would be better of leaving it out, and writing the hand data like this:

    dos.write(hands[j]);

... making use of the OutputStream.write(byte[]) method, and getting rid of the innermost write loop. (But using BufferedOutputStream at all will make a much bigger difference!)

If you just want to write to a file, use a logging library like log4j that supports asynchronous logging. You can write that to a file as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM