简体   繁体   中英

Java Create 100MB zipped csv file performance issue

I need to create 100mb zipped file within 5 seconds which contains a CSV file using java. I have created test.zip which contains the CSV file but it is taking too much time (~30 seconds) to generate the zip file. Here is the code that I've written so far:

ByteArrayOutputStream baos = new ByteArrayOutputStream();
/* Create instance of ZipOutputStream to create ZIP file. */
ZipOutputStream zipOutputStream = new ZipOutputStream(baos);

/* Create ZIP entry for file.The file which is created put into the
 * zip file.File is not on the disk, csvFileName indicates only the
 * file name to be put into the zip
 */
ZipEntry zipEntry = new ZipEntry("Test.zip");

zipOutputStream.putNextEntry(zipEntry);

/* Create OutputStreamWriter for CSV. There is no need for staging
 * the CSV on filesystem . Directly write bytes to the output stream.
 */
BufferedWriter bufferedWriter = new BufferedWriter(new OutputStreamWriter(zipOutputStream, "UTF-8"));

CsvListWriter csvListWriter = new CsvListWriter(bufferedWriter, CsvPreference.EXCEL_PREFERENCE);

/* Write the CSV header to the generated CSV file. */
csvListWriter.writeHeader(CSVGeneratorConstant.CSV_HEADERS);

/* Logic to Write the content to CSV */
long startTime = System.currentTimeMillis();

for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
    final List<String> rowContent = new LinkedList<String>();
    for (int colIdx = 0; colIdx < 6; colIdx++) {
        String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
        rowContent.add(str);
    }
    csvListWriter.write(rowContent);
}
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println("time==" + elapsedTime / 1000f + "Seconds");

System.out.println("Size=====" + baos.size() / (Math.pow(1024, 2)) + "MB");

csvListWriter.close();
bufferedWriter.close();
zipOutputStream.close();
baos.close();

I am using the super csv library, but I have also tried to create zip file in memory without super csv lib without success. Can you please help me?

Your test data is about 1GB, which compresses down to 100MB. Depending on your hardware, it may not be possible to achieve < 5s performance.

I've put together a quick and dirty benchmark which highlights the performance impacts of writing to a zip file.

  • Write to CSV with String.join() : 9.6s
  • Write to CSV with Super CSV: 12.7s
  • Write to CSV within zip with String.join() : 18.6s
  • Write to CSV within zip with Super CSV: 22.5s

It appears that there's a little bit of an overhead with using Super CSV (~122%), but just writing to a zip file almost doubles (~190%) the amount of time, regardless of whether Super CSV is used.

Here's the code for the 4 scenarios.

Unlike your provided code, I'm writing directly to a file (I didn't notice any difference between writing to disk vs writing to memory, ie ByteArrayOutputStream ). I've also skipped the BufferedWriter on the Super CSV examples, because it already uses that internally, and I've used try-with-resources to make things cleaner.

@Test
public void testWriteToCsvFileWithSuperCSV() throws Exception {
    long startTime = System.currentTimeMillis();

    try (FileOutputStream csvFile = new FileOutputStream(new File("supercsv.csv"));
         ICsvListWriter writer = new CsvListWriter(new OutputStreamWriter(csvFile, "UTF-8"), CsvPreference.EXCEL_PREFERENCE)
    ){
        for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
            final List<String> rowContent = new LinkedList<>();
            for (int colIdx = 0; colIdx < 6; colIdx++) {
                String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
                rowContent.add(str);
            }
            writer.write(rowContent);
        }
    }

    long stopTime = System.currentTimeMillis();
    long elapsedTime = stopTime - startTime;
    System.out.println("Writing to CSV with Super CSV took " + (elapsedTime / 1000f) + " seconds");
}

@Test
public void testWriteToCsvFileWithinZipWithSuperCSV() throws Exception {
    long startTime = System.currentTimeMillis();

    try (FileOutputStream zipFile = new FileOutputStream(new File("supercsv.zip"));
         ZipOutputStream zos = new ZipOutputStream(zipFile);
         ICsvListWriter writer = new CsvListWriter(new OutputStreamWriter(zos, "UTF-8"), CsvPreference.EXCEL_PREFERENCE)
    ){

        ZipEntry csvFile = new ZipEntry("supercsvwithinzip.csv");
        zos.putNextEntry(csvFile);

        for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
            final List<String> rowContent = new LinkedList<>();
            for (int colIdx = 0; colIdx < 6; colIdx++) {
                String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
                rowContent.add(str);
            }
            writer.write(rowContent);
        }
    }

    long stopTime = System.currentTimeMillis();
    long elapsedTime = stopTime - startTime;
    System.out.println("Writing to CSV within zip file with Super CSV took " + (elapsedTime / 1000f) + " seconds");
}

@Test
public void testWriteToCsvFileWithStringJoin() throws Exception {
    long startTime = System.currentTimeMillis();

    try (FileOutputStream textFile = new FileOutputStream(new File("join.csv"));
         BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(textFile, "UTF-8"));
    ){

        for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
            final List<String> rowContent = new LinkedList<>();
            for (int colIdx = 0; colIdx < 6; colIdx++) {
                String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
                rowContent.add(str);
            }
            writer.append(String.join(",", rowContent) + "\n");
        }
    }

    long stopTime = System.currentTimeMillis();
    long elapsedTime = stopTime - startTime;
    System.out.println("Writing to CSV with String.join() took " + (elapsedTime / 1000f) + " seconds");
}

@Test
public void testWriteToCsvFileWithinZipWithStringJoin() throws Exception {
    long startTime = System.currentTimeMillis();

    try (FileOutputStream zipFile = new FileOutputStream(new File("join.zip"));
         ZipOutputStream zos = new ZipOutputStream(zipFile);
         BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(zos, "UTF-8"));
    ){

        ZipEntry csvFile = new ZipEntry("joinwithinzip.csv");
        zos.putNextEntry(csvFile);

        for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
            final List<String> rowContent = new LinkedList<>();
            for (int colIdx = 0; colIdx < 6; colIdx++) {
                String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
                rowContent.add(str);
            }
            writer.append(String.join(",", rowContent) + "\n");
        }
    }

    long stopTime = System.currentTimeMillis();
    long elapsedTime = stopTime - startTime;
    System.out.println("Writing to CSV within zip with String.join() took " + (elapsedTime / 1000f) + " seconds");
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM