[英]Java Create 100MB zipped csv file performance issue
我需要在5秒钟内创建100mb的压缩文件,其中包含使用java的CSV文件。 我创建了包含CSV文件的test.zip,但是生成zip文件花费了太多时间(〜30秒)。 这是我到目前为止编写的代码:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
/* Create instance of ZipOutputStream to create ZIP file. */
ZipOutputStream zipOutputStream = new ZipOutputStream(baos);
/* Create ZIP entry for file.The file which is created put into the
* zip file.File is not on the disk, csvFileName indicates only the
* file name to be put into the zip
*/
ZipEntry zipEntry = new ZipEntry("Test.zip");
zipOutputStream.putNextEntry(zipEntry);
/* Create OutputStreamWriter for CSV. There is no need for staging
* the CSV on filesystem . Directly write bytes to the output stream.
*/
BufferedWriter bufferedWriter = new BufferedWriter(new OutputStreamWriter(zipOutputStream, "UTF-8"));
CsvListWriter csvListWriter = new CsvListWriter(bufferedWriter, CsvPreference.EXCEL_PREFERENCE);
/* Write the CSV header to the generated CSV file. */
csvListWriter.writeHeader(CSVGeneratorConstant.CSV_HEADERS);
/* Logic to Write the content to CSV */
long startTime = System.currentTimeMillis();
for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
final List<String> rowContent = new LinkedList<String>();
for (int colIdx = 0; colIdx < 6; colIdx++) {
String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
rowContent.add(str);
}
csvListWriter.write(rowContent);
}
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println("time==" + elapsedTime / 1000f + "Seconds");
System.out.println("Size=====" + baos.size() / (Math.pow(1024, 2)) + "MB");
csvListWriter.close();
bufferedWriter.close();
zipOutputStream.close();
baos.close();
我正在使用超级csv库,但我也尝试在没有超级csv lib的情况下在内存中创建zip文件,但未成功。 你能帮我么?
您的测试数据约为1GB,压缩到100MB。 根据您的硬件,可能无法达到<5s的性能。
我整理了一个快速而肮脏的基准,重点介绍了写入zip文件对性能的影响。
String.join()
写入CSV:9.6s String.join()
在zip中写入CSV:18.6s 似乎使用Super CSV会有一点开销(〜122%),但是无论是否使用Super CSV,仅写入zip文件几乎都会花费两倍的时间(〜190%)。
这是这四个方案的代码。
与您提供的代码不同,我直接写入文件(我注意到写入磁盘与写入内存(即ByteArrayOutputStream
)之间没有任何区别)。 我还跳过了Super CSV示例中的BufferedWriter
,因为它已经在内部使用了它,并且我使用了try-with-resources使内容更整洁。
@Test
public void testWriteToCsvFileWithSuperCSV() throws Exception {
long startTime = System.currentTimeMillis();
try (FileOutputStream csvFile = new FileOutputStream(new File("supercsv.csv"));
ICsvListWriter writer = new CsvListWriter(new OutputStreamWriter(csvFile, "UTF-8"), CsvPreference.EXCEL_PREFERENCE)
){
for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
final List<String> rowContent = new LinkedList<>();
for (int colIdx = 0; colIdx < 6; colIdx++) {
String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
rowContent.add(str);
}
writer.write(rowContent);
}
}
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println("Writing to CSV with Super CSV took " + (elapsedTime / 1000f) + " seconds");
}
@Test
public void testWriteToCsvFileWithinZipWithSuperCSV() throws Exception {
long startTime = System.currentTimeMillis();
try (FileOutputStream zipFile = new FileOutputStream(new File("supercsv.zip"));
ZipOutputStream zos = new ZipOutputStream(zipFile);
ICsvListWriter writer = new CsvListWriter(new OutputStreamWriter(zos, "UTF-8"), CsvPreference.EXCEL_PREFERENCE)
){
ZipEntry csvFile = new ZipEntry("supercsvwithinzip.csv");
zos.putNextEntry(csvFile);
for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
final List<String> rowContent = new LinkedList<>();
for (int colIdx = 0; colIdx < 6; colIdx++) {
String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
rowContent.add(str);
}
writer.write(rowContent);
}
}
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println("Writing to CSV within zip file with Super CSV took " + (elapsedTime / 1000f) + " seconds");
}
@Test
public void testWriteToCsvFileWithStringJoin() throws Exception {
long startTime = System.currentTimeMillis();
try (FileOutputStream textFile = new FileOutputStream(new File("join.csv"));
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(textFile, "UTF-8"));
){
for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
final List<String> rowContent = new LinkedList<>();
for (int colIdx = 0; colIdx < 6; colIdx++) {
String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
rowContent.add(str);
}
writer.append(String.join(",", rowContent) + "\n");
}
}
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println("Writing to CSV with String.join() took " + (elapsedTime / 1000f) + " seconds");
}
@Test
public void testWriteToCsvFileWithinZipWithStringJoin() throws Exception {
long startTime = System.currentTimeMillis();
try (FileOutputStream zipFile = new FileOutputStream(new File("join.zip"));
ZipOutputStream zos = new ZipOutputStream(zipFile);
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(zos, "UTF-8"));
){
ZipEntry csvFile = new ZipEntry("joinwithinzip.csv");
zos.putNextEntry(csvFile);
for (int rowIdx = 0; rowIdx < 7000000; rowIdx++) {
final List<String> rowContent = new LinkedList<>();
for (int colIdx = 0; colIdx < 6; colIdx++) {
String str = "R" + rowIdx + "C" + colIdx + " FieldContent";
rowContent.add(str);
}
writer.append(String.join(",", rowContent) + "\n");
}
}
long stopTime = System.currentTimeMillis();
long elapsedTime = stopTime - startTime;
System.out.println("Writing to CSV within zip with String.join() took " + (elapsedTime / 1000f) + " seconds");
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.