简体   繁体   English

Java创建100MB压缩的CSV文件性能问题

[英]Java Create 100MB zipped csv file performance issue

I need to create 100mb zipped file within 5 seconds which contains a CSV file using java. 我需要在5秒钟内创建100mb的压缩文件,其中包含使用java的CSV文件。 I have created test.zip which contains the CSV file but it is taking too much time (~30 seconds) to generate the zip file. 我创建了包含CSV文件的test.zip,但是生成zip文件花费了太多时间(〜30秒)。 Here is the code that I've written so far: 这是我到目前为止编写的代码:

ByteArrayOutputStream baos = new ByteArrayOutputStream();
/* Create instance of ZipOutputStream to create ZIP file. */
ZipOutputStream zipOutputStream = new ZipOutputStream(baos);

/* Create ZIP entry for file.The file which is created put into the
 * zip file.File is not on the disk, csvFileName indicates only the
 * file name to be put into the zip
 */
ZipEntry zipEntry = new ZipEntry("Test.zip");

zipOutputStream.putNextEntry(zipEntry);

/* Create OutputStreamWriter for CSV. There is no need for staging
 * the CSV on filesystem . Directly write bytes to the output stream.
 */
BufferedWriter bufferedWriter = new BufferedWriter(new OutputStreamWriter(zipOutputStream, "UTF-8"));

CsvListWriter csvListWriter = new CsvListWriter(bufferedWriter, CsvPreference.EXCEL_PREFERENCE);

/* Write the CSV header to the generated CSV file. */
csvListWriter.writeHeader(CSVGeneratorConstant.CSV_HEADERS);

/* Logic to Write the content to CSV */
long startTime = System.currentTimeMillis();

for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
    final List<String> rowContent = new LinkedList<String>();
    for (int colIdx = 0; colIdx < 6; colIdx++) {
        String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
        rowContent.add(str);
    }
    csvListWriter.write(rowContent);
}
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println("time==" + elapsedTime / 1000f + "Seconds");

System.out.println("Size=====" + baos.size() / (Math.pow(1024, 2)) + "MB");

csvListWriter.close();
bufferedWriter.close();
zipOutputStream.close();
baos.close();

I am using the super csv library, but I have also tried to create zip file in memory without super csv lib without success. 我正在使用超级csv库,但我也尝试在没有超级csv lib的情况下在内存中创建zip文件,但未成功。 Can you please help me? 你能帮我么?

Your test data is about 1GB, which compresses down to 100MB. 您的测试数据约为1GB,压缩到100MB。 Depending on your hardware, it may not be possible to achieve < 5s performance. 根据您的硬件,可能无法达到<5s的性能。

I've put together a quick and dirty benchmark which highlights the performance impacts of writing to a zip file. 我整理了一个快速而肮脏的基准,重点介绍了写入zip文件对性能的影响。

  • Write to CSV with String.join() : 9.6s String.join()写入CSV:9.6s
  • Write to CSV with Super CSV: 12.7s 使用超级CSV写入CSV:12.7秒
  • Write to CSV within zip with String.join() : 18.6s 使用String.join()在zip中写入CSV:18.6s
  • Write to CSV within zip with Super CSV: 22.5s 使用Super CSV在zip内写入CSV:22.5秒

It appears that there's a little bit of an overhead with using Super CSV (~122%), but just writing to a zip file almost doubles (~190%) the amount of time, regardless of whether Super CSV is used. 似乎使用Super CSV会有一点开销(〜122%),但是无论是否使用Super CSV,仅写入zip文件几乎都会花费两倍的时间(〜190%)。

Here's the code for the 4 scenarios. 这是这四个方案的代码。

Unlike your provided code, I'm writing directly to a file (I didn't notice any difference between writing to disk vs writing to memory, ie ByteArrayOutputStream ). 与您提供的代码不同,我直接写入文件(我注意到写入磁盘与写入内存(即ByteArrayOutputStream )之间没有任何区别)。 I've also skipped the BufferedWriter on the Super CSV examples, because it already uses that internally, and I've used try-with-resources to make things cleaner. 我还跳过了Super CSV示例中的BufferedWriter ,因为它已经在内部使用了它,并且我使用了try-with-resources使内容更整洁。

@Test
public void testWriteToCsvFileWithSuperCSV() throws Exception {
    long startTime = System.currentTimeMillis();

    try (FileOutputStream csvFile = new FileOutputStream(new File("supercsv.csv"));
         ICsvListWriter writer = new CsvListWriter(new OutputStreamWriter(csvFile, "UTF-8"), CsvPreference.EXCEL_PREFERENCE)
    ){
        for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
            final List<String> rowContent = new LinkedList<>();
            for (int colIdx = 0; colIdx < 6; colIdx++) {
                String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
                rowContent.add(str);
            }
            writer.write(rowContent);
        }
    }

    long stopTime = System.currentTimeMillis();
    long elapsedTime = stopTime - startTime;
    System.out.println("Writing to CSV with Super CSV took " + (elapsedTime / 1000f) + " seconds");
}

@Test
public void testWriteToCsvFileWithinZipWithSuperCSV() throws Exception {
    long startTime = System.currentTimeMillis();

    try (FileOutputStream zipFile = new FileOutputStream(new File("supercsv.zip"));
         ZipOutputStream zos = new ZipOutputStream(zipFile);
         ICsvListWriter writer = new CsvListWriter(new OutputStreamWriter(zos, "UTF-8"), CsvPreference.EXCEL_PREFERENCE)
    ){

        ZipEntry csvFile = new ZipEntry("supercsvwithinzip.csv");
        zos.putNextEntry(csvFile);

        for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
            final List<String> rowContent = new LinkedList<>();
            for (int colIdx = 0; colIdx < 6; colIdx++) {
                String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
                rowContent.add(str);
            }
            writer.write(rowContent);
        }
    }

    long stopTime = System.currentTimeMillis();
    long elapsedTime = stopTime - startTime;
    System.out.println("Writing to CSV within zip file with Super CSV took " + (elapsedTime / 1000f) + " seconds");
}

@Test
public void testWriteToCsvFileWithStringJoin() throws Exception {
    long startTime = System.currentTimeMillis();

    try (FileOutputStream textFile = new FileOutputStream(new File("join.csv"));
         BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(textFile, "UTF-8"));
    ){

        for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
            final List<String> rowContent = new LinkedList<>();
            for (int colIdx = 0; colIdx < 6; colIdx++) {
                String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
                rowContent.add(str);
            }
            writer.append(String.join(",", rowContent) + "\n");
        }
    }

    long stopTime = System.currentTimeMillis();
    long elapsedTime = stopTime - startTime;
    System.out.println("Writing to CSV with String.join() took " + (elapsedTime / 1000f) + " seconds");
}

@Test
public void testWriteToCsvFileWithinZipWithStringJoin() throws Exception {
    long startTime = System.currentTimeMillis();

    try (FileOutputStream zipFile = new FileOutputStream(new File("join.zip"));
         ZipOutputStream zos = new ZipOutputStream(zipFile);
         BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(zos, "UTF-8"));
    ){

        ZipEntry csvFile = new ZipEntry("joinwithinzip.csv");
        zos.putNextEntry(csvFile);

        for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
            final List<String> rowContent = new LinkedList<>();
            for (int colIdx = 0; colIdx < 6; colIdx++) {
                String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
                rowContent.add(str);
            }
            writer.append(String.join(",", rowContent) + "\n");
        }
    }

    long stopTime = System.currentTimeMillis();
    long elapsedTime = stopTime - startTime;
    System.out.println("Writing to CSV within zip with String.join() took " + (elapsedTime / 1000f) + " seconds");
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM