简体   繁体   English

Pentaho数据集成-Excel Writer输出文件大小

[英]Pentaho Data Integration - Excel Writer Output File Size

Is PDI inefficient in terms of writing excel xlsx file with Microsoft Excel Writer. 使用Microsoft Excel Writer编写excel xlsx文件时,PDI效率低下。

A transformed excel data file in Pentaho output seems to be three times the size, if the data was transformed manually. 如果数据是手动转换的,则Pentaho输出中转换后的excel数据文件的大小似乎是其三倍。 Is this inefficiency expected or is there a workaround for it. 这是效率低下的预期吗,还是有解决方法?

A CSV file of the same transformed output is way smaller in size. 具有相同转换输出的CSV文件的大小要小得多。 Have I configured something wrong ? 我配置错了吗?

xlsx files should normally be smaller in size than CSV, since they consist of XML data compressed in ZIP files. xlsx文件的大小通常应小于CSV,因为它们由压缩为ZIP文件的XML数据组成。 Pentaho's Microsoft Excel Writer uses org.apache.poi.xssf.streaming.SXSSFWorkbook and org.apache.poi.xssf.usermodel.XSSFWorkbook to write xlsx files, and they create compressed files so this should not be your issue. Pentaho的Microsoft Excel Writer使用org.apache.poi.xssf.streaming.SXSSFWorkbookorg.apache.poi.xssf.usermodel.XSSFWorkbook来编写xlsx文件,并且它们创建压缩文件,因此这不应该成为您的问题。

To check the files you could check with a zip utility, to see the file sizes and compression rate, to see if there is a bug. 要检查文件,可以使用zip实用程序进行检查,以查看文件大小和压缩率,以查看是否存在错误。 You could also try to open the file in Excel and re-save it, to see if that gives a smaller size, which could indicate an inefficiency. 您也可以尝试在Excel中打开文件并重新保存,以查看是否提供了较小的文件大小,这可能表示效率低下。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM