简体   繁体   English

Python:在多个工作表上将pandas DataFrame写入Excel的最快方法

[英]Python: fastest way to write pandas DataFrame to Excel on multiple sheets

I need to export 24 pandas data frames ( 140 columns x 400 rows) to Excel , each into a different sheet. 我需要将24个pandas数据框(140列x 400行)导出到Excel ,每个都放入不同的工作表。

I am using pandas' built-in ExcelWriter . 我正在使用pandas的内置ExcelWriter Running 24 scenarios, it takes: 运行24个场景,需要:

51 seconds to write to an .xls file (using xlwt ) 写入.xls文件51秒(使用xlwt

86 seconds to write to an .xlsx file (using XlsxWriter ) 写入.xlsx文件需要86秒(使用XlsxWriter

141 seconds to write to an .xlsm file (using openpyxl ) 141秒写入.xlsm文件(使用openpyxl

21 seconds to just run the program (no Excel output) 只运行程序21秒(无Excel输出)

The problem with writing to .xls is that the spreadsheet contains no formatting styles, so if I open it in Excel, select a column, and click on the 'comma' button to format the numbers, it tells me: 'style comma not found'. 写入.xls的问题是电子表格中没有格式化样式,所以如果我在Excel中打开它,选择一个列,然后单击“逗号”按钮来格式化数字,它会告诉我:'样式逗号未找到”。 I don't get this problem writing to an .xlsx , but that's even slower. 我写这个问题并没有写到.xlsx ,但这甚至更慢。

Any suggestions on how to make the exporting faster? 有关如何使出口更快的任何建议? I can't be the first one to have this problem, yet after hours of searching forums and websites I haven't found any definite solution. 我不能成为第一个遇到这个问题的人,但经过几个小时的搜索论坛和网站,我还没有找到任何明确的解决方案。

The only thing I can think of is to use Python to export to csv files, and then write an Excel macro to merge all the CSVs into a single spreadsheet. 我唯一能想到的是使用Python导出到csv文件,然后编写一个Excel宏来将所有CSV合并到一个电子表格中。

The .xls file is 10 MB, and the .xlsx 5.2 MB .xls文件为10 MB, .xlsx 5.2 MB

Thanks! 谢谢!

Here is a benchmark for different Python to Excel modules . 这是不同的Python到Excel模块基准

And here is the output for 140 columns x (400 x 24) rows using the latest version of the modules at the time of posting: 以下是使用发布时最新版本模块的140列x(400 x 24)行的输出:

Versions:
    python      : 2.7.7
    openpyxl    : 2.0.5
    pyexcelerate: 0.6.3
    xlsxwriter  : 0.5.7
    xlwt        : 0.7.5

Dimensions:
    Rows = 9600 (400 x 24)
    Cols = 140

Times:
    pyexcelerate          :  11.85
    xlwt                  :  17.64
    xlsxwriter (optimised):  21.63
    xlsxwriter            :  26.76
    openpyxl   (optimised):  95.18
    openpyxl              : 119.29

As with any benchmark the results will depend on Python/module versions, CPU, RAM and Disk I/O and on the benchmark itself. 与任何基准测试一样,结果将取决于Python /模块版本,CPU,RAM和磁盘I / O以及基准测试本身。 So make sure to verify these results for your own setup. 因此,请确保为您自己的设置验证这些结果。

Also, since you asked specifically about Pandas, please note that PyExcelerate isn't supported . 此外,由于您特别询问了Pandas,请注意不支持 PyExcelerate。

For what it's worth, this is how I format the output in xlwt. 对于它的价值,这就是我在xlwt中格式化输出的方式。 The documentation is (or at least was) pretty spotty so I had to guess most of this! 文档是(或至少是)非常参差不齐所以我不得不猜测大部分内容!

import xlwt

style = xlwt.XFStyle()
style.font.name = 'Courier'
style.font.height = 180
style.num_format_str = '#,##0'

# ws0 is a worksheet
ws0.write( row, col, value, style )

Also, I believe I duplicated your error message when attempting to format the resulting spreadsheet in excel (office 2010 version). 此外,我相信我在尝试格式化Excel中生成的电子表格时复制了您的错误消息(Office 2010版本)。 It's weird, but some of the drop down tool bar format options work and some don't. 这很奇怪,但有些下拉工具栏格式选项有效,有些则没有。 But it looks like they all work fine if I go to "format cells" via a right click. 但是,如果我通过右键单击“格式化单元格”,它们看起来都可以正常工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM